keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.98k stars 19.48k forks source link

Custom Functions on Gradients/Gradient Norms #5142

Closed tobyfrancis closed 7 years ago

tobyfrancis commented 7 years ago

For research purposes, the ability to define custom functions on the gradient as a part of the optimizer would be very useful. It would be in line with the implementations of clipnorm/clipvalue. This would allow different functions other than simply clipping to be experimented with. An example of the utility of this functionality is demonstrated by the creation of the RevReLU layer in https://arxiv.org/pdf/1612.02766.pdf, which uses the ReLU function on the gradients to perform weakly supervised semantic segmentation.

ncullen93 commented 7 years ago

Ok, there's not really a question here. But to add something LIKE clipnorm, you should read the code in the keras/optimizers.py file. For instance, here is the code which supplies the gradient updates for the SGD optimizer:

    def get_updates(self, params, constraints, loss):
        grads = self.get_gradients(loss, params)
        self.updates = []

        lr = self.lr
        if self.initial_decay > 0:
            lr *= (1. / (1. + self.decay * self.iterations))
            self.updates .append(K.update_add(self.iterations, 1))

        # momentum
        shapes = [K.get_variable_shape(p) for p in params]
        moments = [K.zeros(shape) for shape in shapes]
        self.weights = [self.iterations] + moments
        for p, g, m in zip(params, grads, moments):
            v = self.momentum * m - lr * g  # velocity
            self.updates.append(K.update(m, v))

            if self.nesterov:
                new_p = p + self.momentum * v - lr * g
            else:
                new_p = p + v

            # apply constraints
            if p in constraints:
                c = constraints[p]
                new_p = c(new_p)

            self.updates.append(K.update(p, new_p))
        return self.updates

So you see this function self.get_gradients(), which is implemented in the base Optimizer class and reads as follows:

    def get_gradients(self, loss, params):
        grads = K.gradients(loss, params)
        if hasattr(self, 'clipnorm') and self.clipnorm > 0:
            norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
            grads = [clip_norm(g, self.clipnorm, norm) for g in grads]
        if hasattr(self, 'clipvalue') and self.clipvalue > 0:
            grads = [K.clip(g, -self.clipvalue, self.clipvalue) for g in grads]
        return grads

As you can see, the gradients are clearly available during either of these function calls (which happens after each training batch), and thus you can easily make your own custom optimizer and add any such functions as you'd like!

tobyfrancis commented 7 years ago

Sorry, should've reframed as a question - shouldn't have tried to write this on a bus on my phone in the morning either because I wrote this question pretty incorrectly. My question is "how would you go about using the gradients for some arbitrary tensor X (for a convolutional neural network) in the network to modify the activations of X between each update"?

I realized it's trivial to apply a function to the gradients, but it's unclear how you would modify the activations using the gradients from the last update before moving forward to the next. Thanks for the concise answer on my poorly-phrased question though. It has made me think about a potential implementation: if I can access the loss between each update, I can get the gradient for the parameters in question, then manually change the activations by the value of the gradient (if the gradient is less than zero, have set the activation to zero, while if the gradient is greater than zero, then keep the activation).

ncullen93 commented 7 years ago

Yes, that is the params argument in the get_updates() function - it is the weights of the network, and are ONLY modified in that get_updates() call. If you want to modify your weights, it should be there or in a custom optimizer in your case.. That is simple and case closed.

If you want to modify activations (which are of course not trainable), you need to use something like the BatchNormalization() layer, but again, custom. How would you access the gradients from such a layer - is that your question? I have a good idea, how about you make a custom optimizer and add a class property of the layer that STORES the gradients! Here's what it might look like:

def get_updates(self, params, constraints, loss):
        grads = self.get_gradients(loss, params)
        self.my_layer.stored_gradients = grads # ADD THIS
        self.updates = []
....

Now, of course you might have to pass in the layer as an argument to your custom optimizer:


class CustomOpt(Optimizer):

    def __init__(my_layer):
           self.my_layer = my_layer

Then use it as follows:

myopt = CustomOpt(my_layer=my_layer)

But then when you make that custom layer, you can do as follows:

class CustomLayer(Layer)
     def __init__(self):
          self.stored_gradients = []
... blah ...
    def call(self, x): 
        activations = x
        current_gradients = self.stored_gradients # should be updated from custom optimizer
        # alter the activations however and return them
        new_activations = activations * K.mean(current_gradients)
        return new_activations

Overall, your model might look like this:

input = Input(shape=..)
dense =  Dense(100,activation='relu')(input)
custom = CustomLayer(...)(dense) # this layer will take dense activations and do something

model = Model(input=input,output=custom)
custom_opt = CustomOpt(my_layer=custom)
model.compile(optimizer=custom_opt,loss=...)

model.fit(..) 
tobyfrancis commented 7 years ago

Thank you! That will work perfectly for my purposes - I'm just delving into writing custom layers for the first time (never needed to before) and it's a bit of a daunting task at first. I'll leave this open for a second while I get the custom layer configured - once I'm done I'll post it here if you find that useful (the RevReLU layer is an interesting idea and definitely might be interesting to others working with weakly labeled data).

hiteshnitetc commented 7 years ago

@ncullen93 I want to apple PSO optimizer to my network within keras. how can i modify Keras.optimizer.py? Because there is no need of gradient in my PSO code.

haskarb commented 7 years ago

@hiteshnitetc were you able use PSO?

hiteshnitetc commented 7 years ago

yes

On Fri, Nov 10, 2017 at 10:47 AM, Bhaskar Dhariyal <notifications@github.com

wrote:

@hiteshnitetc https://github.com/hiteshnitetc were you able use PSO?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/5142#issuecomment-343377231, or mute the thread https://github.com/notifications/unsubscribe-auth/AdMpSjWx9hXbaVOA7gG3AWjUcSg5brjoks5s09xvgaJpZM4LraZq .

haskarb commented 7 years ago

@hiteshnitetc Can I have your mail address?

hiteshnitetc commented 7 years ago

teku123@gmail.com.

On Sun, Nov 12, 2017 at 8:11 AM, Bhaskar Dhariyal notifications@github.com wrote:

@hiteshnitetc https://github.com/hiteshnitetc Can I have your mail address?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/5142#issuecomment-343709366, or mute the thread https://github.com/notifications/unsubscribe-auth/AdMpSueIBDc3HwBjX3AUGPKAX9QnpmYnks5s1lrbgaJpZM4LraZq .

pGit1 commented 6 years ago

@hiteshnitetc you got PSO to work with Keras? Any code if so? Would be interesting to see.

hiteshnitetc commented 6 years ago

It's not completely on keras , it's uses some functions of keras like convolution and losses. I have write weight updation by using numpy, because i have not much knowledge about keras ops