Closed miguelmartin75 closed 3 years ago
There is a pull request open here: https://github.com/fchollet/keras/pull/1991
That PR and the next one linked seemed to be closed. I suppose I could try to update #3004 such that it is 2.0 compatible.
I'd be very thankful. I need this feature to reproduce the results of a paper that uses Caffe. I'm pretty sure I'm not the only one who'd like to be able to easily go from Caffe to keras.
Up. I also need this very important feature.
up +1
+1
+1
+1 Literally lives are relying on this feature!
@miguelmartin75 any updates?
+1
+1
+1
Any update on this? I have found this which a few people claim works.
I think a temporary way to do this is to modify your optimizer, i.e. copy the original keras code of optimizers, and replace every lr
with your own definition.
For example, using SGD to train the last layer at lr=0.01
, the other lr*0.1=0.001
:
First, copy the code from keras.optimizers.SGD
and define a new optimizer MultiSGD
. Make 2 changes:
__init__
, add a list exception_vars
and a multiplier=0.1
to the arguments. Variables in the list will not be applied the multiplier.get_updates()
, add a new line at the beginning of the loop: multiplied_lr = lr * self.multiplier if p in self.exception_vars else lr
. Then, in each line where lr is used to calculate updates, i.e. v = self.momentum * m - lr * g
and new_p = p + self.momentum * v - lr * g
, replace lr
with multiplied_lr
.Second, before compiling your model, enumerate variables in each layer:
last_layer_variables = list()
for layer in model.layers:
if layer.name in ['prediction']:
last_layer_variables.extend(layer.weights)
multisgd = MultiSGD(....exception_vars=last_layer_variables, multiplier=0.1)
Then you can use multisgd
to compile your model just the same way you use other optimizers.
This is just an example. You can modify other optimizers or apply more complicated multipliers in similar ways. I am not sure if this is 100% correct, but it works perfectly on my computer.
+1
+1
+1
followed @zhenbangchen solution and it works. sample script here : https://ksaluja15.github.io/Learning-Rate-Multipliers-in-Keras/
+1 any updates on this?
+1
+1
+1 The steady stream continues ... :)
+1
+1
+1
+1
+1
+1
+1
+1
+1 Please update this feature.
+1
+1
+1
+1
+1
+1
+1
Any updates on this issue?
Any updates on the pull request above?
Hey @miguelmartin75 I have tried to convert the MultiSGD copied from https://gist.github.com/mkocabas/99658da8186145f6f1e2fc70e882dac0 to be compatible with tensorflow.keras but I got an error:
Error while reading resource variable training/MultiSGD/Variable_180 from Container: localhost. This could mean that the variable was uninitialized
from tensorflow.python.keras.optimizers import Optimizer
from tensorflow.python.keras import backend as K
from tensorflow.python.ops import state_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.util.tf_export import tf_export
@tf_export('keras.optimizers.MultiSGD')
class MultiSGD(Optimizer):
"""
Modified SGD with added support for learning multiplier for kernels and biases
taken from https://gist.github.com/mkocabas/99658da8186145f6f1e2fc70e882dac0
Stochastic gradient descent optimizer.
Includes support for momentum,
learning rate decay, and Nesterov momentum.
Parameters
----------
lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter updates momentum.
decay: float >= 0. Learning rate decay over each update.
nesterov: boolean. Whether to apply Nesterov momentum.
"""
def __init__(self, lr=0.01, momentum=0., decay=0.,
nesterov=False, lr_mult=None, **kwargs):
super(MultiSGD, self).__init__(**kwargs)
with K.name_scope(self.__class__.__name__):
self.iterations = K.variable(0, dtype='int64', name='iterations')
self.lr = K.variable(lr, name='lr')
self.momentum = K.variable(momentum, name='momentum')
self.decay = K.variable(decay, name='decay')
self.initial_decay = decay
self.nesterov = nesterov
self.lr_mult = lr_mult
# @interfaces.legacy_get_updates_support
def get_updates(self, loss, params):
grads = self.get_gradients(loss, params)
# self.updates = [K.update_add(self.iterations, 1)]
self.updates = [state_ops.assign_add(self.iterations, 1)]
lr = self.lr
if self.initial_decay > 0:
# lr *= (1. / (1. + self.decay * K.cast(self.iterations,
# K.dtype(self.decay))))
lr = lr * ( # pylint: disable=g-no-augmented-assignment
1. / (1. + self.decay * math_ops.cast(self.iterations,
K.dtype(self.decay))))
# momentum
shapes = [K.int_shape(p) for p in params]
moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
if p.name in self.lr_mult:
multiplied_lr = lr * self.lr_mult[p.name]
else:
multiplied_lr = lr
v = self.momentum * m - multiplied_lr * g # velocity
# self.updates.append(K.update(m, v))
self.updates.append(state_ops.assign(m, v))
if self.nesterov:
new_p = p + self.momentum * v - multiplied_lr * g
else:
new_p = p + v
# Apply constraints.
if getattr(p, 'constraint', None) is not None:
new_p = p.constraint(new_p)
# self.updates.append(K.update(p, new_p))
self.updates.append(state_ops.assign(p, new_p))
return self.updates
def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'momentum': float(K.get_value(self.momentum)),
'decay': float(K.get_value(self.decay)),
'nesterov': self.nesterov}
base_config = super(MultiSGD, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
I've looked through the documentation and can't seem to find anything equivalent to caffe's lr_mult and decay_mult. My assumption is this is not supported/implemented. Is it possible to add this feature?
Incase you don't know what the feature is: essentially for each layer you can supply a
lr_mult
/decay_mult
which is a decay and learning rate multiplier applied to the kernel and bias weights. For example with AlexNet you can see twolr_mult
anddecay_mult
for the convolutional layers where the firstlr/decay_mult
is applied to the weights and then the second to the bias.Thanks.