Closed tobyfrancis closed 7 years ago
Ok, there's not really a question here. But to add something LIKE clipnorm, you should read the code in the keras/optimizers.py file. For instance, here is the code which supplies the gradient updates for the SGD optimizer:
def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = []
lr = self.lr
if self.initial_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations))
self.updates .append(K.update_add(self.iterations, 1))
# momentum
shapes = [K.get_variable_shape(p) for p in params]
moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
if self.nesterov:
new_p = p + self.momentum * v - lr * g
else:
new_p = p + v
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates
So you see this function self.get_gradients()
, which is implemented in the base Optimizer
class and reads as follows:
def get_gradients(self, loss, params):
grads = K.gradients(loss, params)
if hasattr(self, 'clipnorm') and self.clipnorm > 0:
norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
grads = [clip_norm(g, self.clipnorm, norm) for g in grads]
if hasattr(self, 'clipvalue') and self.clipvalue > 0:
grads = [K.clip(g, -self.clipvalue, self.clipvalue) for g in grads]
return grads
As you can see, the gradients are clearly available during either of these function calls (which happens after each training batch), and thus you can easily make your own custom optimizer and add any such functions as you'd like!
Sorry, should've reframed as a question - shouldn't have tried to write this on a bus on my phone in the morning either because I wrote this question pretty incorrectly. My question is "how would you go about using the gradients for some arbitrary tensor X (for a convolutional neural network) in the network to modify the activations of X between each update"?
I realized it's trivial to apply a function to the gradients, but it's unclear how you would modify the activations using the gradients from the last update before moving forward to the next. Thanks for the concise answer on my poorly-phrased question though. It has made me think about a potential implementation: if I can access the loss between each update, I can get the gradient for the parameters in question, then manually change the activations by the value of the gradient (if the gradient is less than zero, have set the activation to zero, while if the gradient is greater than zero, then keep the activation).
Yes, that is the params
argument in the get_updates() function - it is the weights of the network, and are ONLY modified in that get_updates() call. If you want to modify your weights, it should be there or in a custom optimizer in your case.. That is simple and case closed.
If you want to modify activations (which are of course not trainable), you need to use something like the BatchNormalization() layer, but again, custom. How would you access the gradients from such a layer - is that your question? I have a good idea, how about you make a custom optimizer and add a class property of the layer that STORES the gradients! Here's what it might look like:
def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.my_layer.stored_gradients = grads # ADD THIS
self.updates = []
....
Now, of course you might have to pass in the layer as an argument to your custom optimizer:
class CustomOpt(Optimizer):
def __init__(my_layer):
self.my_layer = my_layer
Then use it as follows:
myopt = CustomOpt(my_layer=my_layer)
But then when you make that custom layer, you can do as follows:
class CustomLayer(Layer)
def __init__(self):
self.stored_gradients = []
... blah ...
def call(self, x):
activations = x
current_gradients = self.stored_gradients # should be updated from custom optimizer
# alter the activations however and return them
new_activations = activations * K.mean(current_gradients)
return new_activations
Overall, your model might look like this:
input = Input(shape=..)
dense = Dense(100,activation='relu')(input)
custom = CustomLayer(...)(dense) # this layer will take dense activations and do something
model = Model(input=input,output=custom)
custom_opt = CustomOpt(my_layer=custom)
model.compile(optimizer=custom_opt,loss=...)
model.fit(..)
Thank you! That will work perfectly for my purposes - I'm just delving into writing custom layers for the first time (never needed to before) and it's a bit of a daunting task at first. I'll leave this open for a second while I get the custom layer configured - once I'm done I'll post it here if you find that useful (the RevReLU layer is an interesting idea and definitely might be interesting to others working with weakly labeled data).
@ncullen93 I want to apple PSO optimizer to my network within keras. how can i modify Keras.optimizer.py? Because there is no need of gradient in my PSO code.
@hiteshnitetc were you able use PSO?
yes
On Fri, Nov 10, 2017 at 10:47 AM, Bhaskar Dhariyal <notifications@github.com
wrote:
@hiteshnitetc https://github.com/hiteshnitetc were you able use PSO?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/5142#issuecomment-343377231, or mute the thread https://github.com/notifications/unsubscribe-auth/AdMpSjWx9hXbaVOA7gG3AWjUcSg5brjoks5s09xvgaJpZM4LraZq .
@hiteshnitetc Can I have your mail address?
teku123@gmail.com.
On Sun, Nov 12, 2017 at 8:11 AM, Bhaskar Dhariyal notifications@github.com wrote:
@hiteshnitetc https://github.com/hiteshnitetc Can I have your mail address?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/5142#issuecomment-343709366, or mute the thread https://github.com/notifications/unsubscribe-auth/AdMpSueIBDc3HwBjX3AUGPKAX9QnpmYnks5s1lrbgaJpZM4LraZq .
@hiteshnitetc you got PSO to work with Keras? Any code if so? Would be interesting to see.
It's not completely on keras , it's uses some functions of keras like convolution and losses. I have write weight updation by using numpy, because i have not much knowledge about keras ops
For research purposes, the ability to define custom functions on the gradient as a part of the optimizer would be very useful. It would be in line with the implementations of clipnorm/clipvalue. This would allow different functions other than simply clipping to be experimented with. An example of the utility of this functionality is demonstrated by the creation of the RevReLU layer in https://arxiv.org/pdf/1612.02766.pdf, which uses the ReLU function on the gradients to perform weakly supervised semantic segmentation.