How to set initial values for weights?

google / prettytensor

Pretty Tensor: Fluent Networks in TensorFlow

1.24k stars 151 forks source link

How to set initial values for weights? #44

Closed Hvass-Labs closed 7 years ago

Hvass-Labs commented 7 years ago

Is it possible to scale the initial random values for the weights in PrettyTensor?

I have a network of a few convolutional layers followed by a few fully-connected layers. They all use relu-activations except for the last layer which is just linear output. I would like the initial output values of the network to be random and close to zero. I think the best way would be to init the random weights in the output-layer to be close to zero.

I see that there is a weights parameter to the fully_connected() class but it is not clear to me how to use it.

Could you give an example in this code? Thanks.

with pt.defaults_scope(activation_fn=tf.nn.relu):
    self.q_values = x_pretty.\
        conv2d(kernel=8, depth=16, stride=4, name='layer_conv1').\
        conv2d(kernel=4, depth=32, stride=2, name='layer_conv2').\
        flatten().\
        fully_connected(size=256, name='layer_fc1').\
        fully_connected(size=num_actions, name='layer_fc2', activation_fn=None)

Hvass-Labs commented 7 years ago

I have made a temporary fix but it is really ugly. The following is added to the above code:

scaler = tf.Variable(initial_value=0.001)
self.q_values = scaler * self.q_values
self.q_values = pt.wrap(self.q_values)

Surely there must be a Pretty way of scaling the initial weights of the layers?

eiderman commented 7 years ago

I see that while the documentation mentions that weights can take an initializer function, it doesn't specify that it is a standard tensorflow initializer (e.g. tf.constant_initializer, tf.random_uniform_initializer, etc.). Basically it is any function that has the signature: init(shape, dtype=tf.float32, partition_info=None). They are listed here (towards the bottom and not in any particular order): https://www.tensorflow.org/api_docs/python/state_ops/sharing_variables

with pt.defaults_scope(activation_fn=tf.nn.relu):
    self.q_values = x_pretty.\
        conv2d(kernel=8, depth=16, stride=4, name='layer_conv1').\
        conv2d(kernel=4, depth=32, stride=2, name='layer_conv2').\
        flatten().\
        fully_connected(size=256, name='layer_fc1').\
        fully_connected(size=num_actions, name='layer_fc2', activation_fn=None, 
                                  weights=tf.random_uniform_initializer(0.001))

Hvass-Labs commented 7 years ago

Thanks. I think I got a little confused because I found the PrettyTensor initializers which need a shape parameter, which I obviously cannot provide. But tf.random_normal_initializer() and tf.truncated_normal_initializer() work fine. It might be a good idea to mention this in the docs.