keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.92k stars 19.45k forks source link

Learning/Decay Rate Multiplier #5920

Closed miguelmartin75 closed 3 years ago

miguelmartin75 commented 7 years ago

I've looked through the documentation and can't seem to find anything equivalent to caffe's lr_mult and decay_mult. My assumption is this is not supported/implemented. Is it possible to add this feature?

Incase you don't know what the feature is: essentially for each layer you can supply a lr_mult/decay_mult which is a decay and learning rate multiplier applied to the kernel and bias weights. For example with AlexNet you can see two lr_mult and decay_mult for the convolutional layers where the first lr/decay_mult is applied to the weights and then the second to the bias.

Thanks.

AvantiShri commented 7 years ago

There is a pull request open here: https://github.com/fchollet/keras/pull/1991

miguelmartin75 commented 7 years ago

That PR and the next one linked seemed to be closed. I suppose I could try to update #3004 such that it is 2.0 compatible.

gabrieldemarmiesse commented 7 years ago

I'd be very thankful. I need this feature to reproduce the results of a paper that uses Caffe. I'm pretty sure I'm not the only one who'd like to be able to easily go from Caffe to keras.

lishen commented 7 years ago

Up. I also need this very important feature.

Tutufa commented 7 years ago

up +1

vlomonaco commented 7 years ago

+1

urasmutlu commented 7 years ago

+1

ghost commented 7 years ago

+1 Literally lives are relying on this feature!

McLawrence commented 7 years ago

@miguelmartin75 any updates?

lamkeewei commented 7 years ago

+1

davidsvaughn commented 7 years ago

+1

George-Zhu commented 7 years ago

+1

Barfknecht commented 7 years ago

Any update on this? I have found this which a few people claim works.

zc813 commented 7 years ago

I think a temporary way to do this is to modify your optimizer, i.e. copy the original keras code of optimizers, and replace every lr with your own definition.

For example, using SGD to train the last layer at lr=0.01, the other lr*0.1=0.001: First, copy the code from keras.optimizers.SGD and define a new optimizer MultiSGD. Make 2 changes:

  1. In __init__, add a list exception_vars and a multiplier=0.1 to the arguments. Variables in the list will not be applied the multiplier.
  2. In get_updates(), add a new line at the beginning of the loop: multiplied_lr = lr * self.multiplier if p in self.exception_vars else lr. Then, in each line where lr is used to calculate updates, i.e. v = self.momentum * m - lr * g and new_p = p + self.momentum * v - lr * g, replace lr with multiplied_lr.

Second, before compiling your model, enumerate variables in each layer:

last_layer_variables = list()
for layer in model.layers:
    if layer.name in ['prediction']:
        last_layer_variables.extend(layer.weights)
multisgd = MultiSGD(....exception_vars=last_layer_variables, multiplier=0.1)

Then you can use multisgd to compile your model just the same way you use other optimizers.

This is just an example. You can modify other optimizers or apply more complicated multipliers in similar ways. I am not sure if this is 100% correct, but it works perfectly on my computer.

raginisharma14 commented 7 years ago

+1

liruoteng commented 7 years ago

+1

ChunhuanLin commented 6 years ago

+1

ksaluja15 commented 6 years ago

followed @zhenbangchen solution and it works. sample script here : https://ksaluja15.github.io/Learning-Rate-Multipliers-in-Keras/

andrisecker commented 6 years ago

+1 any updates on this?

ogencoglu commented 6 years ago

+1

oadoriaa commented 6 years ago

+1

JonGerrand commented 6 years ago

+1 The steady stream continues ... :)

srishti-advenio commented 6 years ago

+1

nio747 commented 6 years ago

+1

Axel13fr commented 6 years ago

+1

shamangary commented 6 years ago

+1

aerdem4 commented 6 years ago

+1

StefanGerlach commented 6 years ago

+1

GuruMulay commented 6 years ago

+1

stanpcf commented 6 years ago

+1

triducnguyentang commented 6 years ago

+1 Please update this feature.

singhay commented 6 years ago

+1

blauigris commented 6 years ago

+1

jayavardhanr commented 6 years ago

+1

eugeniaguerrero commented 6 years ago

+1

KUASWoodyLIN commented 6 years ago

+1

pmpakos commented 6 years ago

+1

Cerebrock commented 6 years ago

+1

vidyasagarr7 commented 5 years ago

Any updates on this issue?

cbarburescu commented 5 years ago

Any updates on the pull request above?

rafikg commented 5 years ago

Hey @miguelmartin75 I have tried to convert the MultiSGD copied from https://gist.github.com/mkocabas/99658da8186145f6f1e2fc70e882dac0 to be compatible with tensorflow.keras but I got an error:

Error while reading resource variable training/MultiSGD/Variable_180 from Container: localhost. This could mean that the variable was uninitialized

from tensorflow.python.keras.optimizers import Optimizer
from tensorflow.python.keras import backend as K
from tensorflow.python.ops import state_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.util.tf_export import tf_export

@tf_export('keras.optimizers.MultiSGD')
class MultiSGD(Optimizer):
    """
    Modified SGD with added support for learning multiplier for kernels and biases
    taken from https://gist.github.com/mkocabas/99658da8186145f6f1e2fc70e882dac0

    Stochastic gradient descent optimizer.
    Includes support for momentum,
    learning rate decay, and Nesterov momentum.
    Parameters
    ----------
    lr: float >= 0. Learning rate.
    momentum: float >= 0. Parameter updates momentum.
    decay: float >= 0. Learning rate decay over each update.
    nesterov: boolean. Whether to apply Nesterov momentum.
    """

    def __init__(self, lr=0.01, momentum=0., decay=0.,
                 nesterov=False, lr_mult=None, **kwargs):
        super(MultiSGD, self).__init__(**kwargs)
        with K.name_scope(self.__class__.__name__):
            self.iterations = K.variable(0, dtype='int64', name='iterations')
            self.lr = K.variable(lr, name='lr')
            self.momentum = K.variable(momentum, name='momentum')
            self.decay = K.variable(decay, name='decay')
        self.initial_decay = decay
        self.nesterov = nesterov
        self.lr_mult = lr_mult

    # @interfaces.legacy_get_updates_support
    def get_updates(self, loss, params):
        grads = self.get_gradients(loss, params)
        # self.updates = [K.update_add(self.iterations, 1)]
        self.updates = [state_ops.assign_add(self.iterations, 1)]

        lr = self.lr
        if self.initial_decay > 0:
            # lr *= (1. / (1. + self.decay * K.cast(self.iterations,
            #                                       K.dtype(self.decay))))
            lr = lr * (  # pylint: disable=g-no-augmented-assignment
                    1. / (1. + self.decay * math_ops.cast(self.iterations,
                                                          K.dtype(self.decay))))
        # momentum
        shapes = [K.int_shape(p) for p in params]
        moments = [K.zeros(shape) for shape in shapes]
        self.weights = [self.iterations] + moments
        for p, g, m in zip(params, grads, moments):

            if p.name in self.lr_mult:
                multiplied_lr = lr * self.lr_mult[p.name]
            else:
                multiplied_lr = lr

            v = self.momentum * m - multiplied_lr * g  # velocity
            # self.updates.append(K.update(m, v))
            self.updates.append(state_ops.assign(m, v))

            if self.nesterov:
                new_p = p + self.momentum * v - multiplied_lr * g
            else:
                new_p = p + v

            # Apply constraints.
            if getattr(p, 'constraint', None) is not None:
                new_p = p.constraint(new_p)

            # self.updates.append(K.update(p, new_p))
            self.updates.append(state_ops.assign(p, new_p))
        return self.updates

    def get_config(self):
        config = {'lr': float(K.get_value(self.lr)),
                  'momentum': float(K.get_value(self.momentum)),
                  'decay': float(K.get_value(self.decay)),
                  'nesterov': self.nesterov}
        base_config = super(MultiSGD, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))