Closed stillwaterman closed 5 years ago
Are you using a tensorflow backend?
@joeyearsley Yes, I use tensorflow as backend
So should be able to use this function still in Keras since tf.layers
call tf.keras.layers
under the hood.
But these lines are the most important and will still work in Keras: https://github.com/joeyearsley/efficient_densenet_tensorflow/blob/eef1190478450ef6df12ce3f9d630c03eb6333dc/models/densenet_creator.py#L136-L140
Thanks for reminding. Actually, I don't know how to train my model, I saw your train code which uses tensorflow style to train your model, however, I'm using fit_generator
in Keras, Could you show some train code in fit_generator
style? A sample code is just ok.
Sorry to disturb you, I think I can implement this by myself easily. Another small question, do we really need Horovod? Perhaps, you just use Horovod framework to build your model. Do I miss some important details about Horovod to save memory?
You are correct, horovod isn’t a crucial detail.
I included horovod and mixed precision training to show how to scale to the max but in a majority of cases it will not be needed.
Thanks for your reply, but I got some errors When I'm trying to use _x = tf.contrib.layers.recompute_grad(_x)
. For the first time, I simply add it to my _conv_block
, which give rise to an error.
Error Message: TypeError: All variables used by a function wrapped with @custom_gradient must be ResourceVariables. Ensure that no variable_scope is created with use_resource=False.
Then I tried to fix this problem by adding the code:with tf.variable_scope('backbone_denseblock_{}'.format(block_idx), use_resource=True)
, but I got another error.
Error Message: AttributeError: NoneType object has no attribute _inbound_nodes.
At that time, I thought it is a problem about tensorflow function. So I tried to use Lambda to wrap it.recompute_grad_cp = Lambda(lambda dx: tf.contrib.layers.recompute_grad(dx)), _x = recompute_grad_cp(_x)
, but Lambda layer only accepts Keras tensor as input. Could you give me some advice about using the code: _x = tf.contrib.layers.recompute_grad(_x)
?
I saw a warning about this function on tensorflow: Warning: Because the function will be called again on the backwards pass, the user should be careful to not use ops in their function that mutate state or have randomness (for example, batch normalization or dropout). If the function does have such operations, it is recommended that the function take the is_recomputing keyword argument which will be False on the forward pass and True on the backwards pass so that it can disable state changes when is_recomputing=True (for example, not updating the moving averages in batch normalization).
Maybe we shouldn't include dropout and BN layers in defined function, Is that right?
Ahhhh, you need to wrap it in a Keras Layer so Keras can store the in bound nodes.
https://keras.io/layers/writing-your-own-keras-layers/
However there’s a bit more to it probably than just that page.
And yes you are correct on the dropout front, I just quickly pieced it together and didn’t really use dropout.
However for batch norm I think it depends on your implementation of the update ops.
Hi bro! Due to the problem of Internet disconnection, I am very sorry for not replying in time these days. I tried to wrap it in a Keras Layer and here is my code:
class Back_Recompute(Layer):
def __init__(self, filters, kernel_size, w_decay, **kwargs):
self.n_filters = filters
self.we_decay = w_decay
self.ks = kernel_size
super(Back_Recompute, self).__init__(**kwargs)
def call(self, ip):
def _x(inner_ip):
x = Conv2D(self.n_filters, self.ks, kernel_initializer=he_normal, padding=same, use_bias=False,
kernel_regularizer=l2(self.we_decay))(inner_ip)
return x
_x = tf.contrib.layers.recompute_grad(_x)
return _x(ip)
def compute_output_shape(self, input_shape):
return (input_shape[0], input_shape[1], input_shape[2], self.n_filters)
The good news is that I didn't meet the problem of 'NoneType object has no attribute _inbound_nodes'. The bad news is that the code tf.variable_scope('backbone_denseblock_{}’.format(block_idx),use_resource=True)
doesn't seem to work anymore and the error message (All variables used by a function wrapped with @custom_gradient must be ResourceVariables. Ensure that no variable_scope is created with use_resource=False.) came back again! Do you have a solution to it? I think it is not easy for me. :(
Could you try placing the variable scope inside the call function?
I tried like this:
class Back_Recompute(Layer):
def __init__(self, filters, kernel_size, w_decay, **kwargs):
self.n_filters = filters
self.we_decay = w_decay
self.ks = kernel_size
super(Back_Recompute, self).__init__(**kwargs)
def call(self, ip):
global brcount
with tf.variable_scope('denseblock_{}'.format(brcount), use_resource=True):
def _x(inner_ip):
x = Conv2D(self.n_filters, self.ks, kernel_initializer='he_normal', padding='same', use_bias=False,
kernel_regularizer=l2(self.we_decay))(inner_ip)
return x
brcount = brcount + 1
_x = tf.contrib.layers.recompute_grad(_x)
return _x(ip)
def compute_output_shape(self, input_shape):
return (input_shape[0], input_shape[1], input_shape[2], self.n_filters)
Fortunately, I didn't meet error messages, it looks useful and the model can compile. But train program was stuck at fit_generator
and there is no error message, it was just stuck.
Unusual, does it work with any other fit method?
I tried fit
method and it was also stuck without any data fit(x=None,y=None,steps_per_epoch=1)
. If I don't use Back_Recompute
layer, this model can start training. This is very strange
Can you use some print statements or TF Prints to diagnose when it stops?
Hi, I think maybe it is because tf.variable_scope
cannot work on keras layer, so I changed Conv2D
to tf.layers.conv2d
. Then I got a error message: AttributeError: 'Activation' object has no attribute 'outbound_nodes'. Any advice about this question?
Unfortunately not, could you raise this as an issue in Tensorflow?
I believe it may be due to some slight API inconsistency across layers, which is weird as tf.layers
call tf.keras.layers
under the hood.
Thanks for the explaination for the horovod package, since it can not be installed on windows.
There’s always numerous other ways to do this, like using TFs distributed estimators or implementing your own parameter server.
I’ve not used windows in years so can’t comment. Maybe take it up with the Horovod team?
Thanks, I run the code on one GPU (GTX1080 ti), when the batch_size set to 3750 it reported the memory error, so I tried the original paper paramter (batch_size=512, init_lr = 0.1 and decrease at150 and 225 epoch), the result is around 88.09%. Can you tell us the final accuracy on cifar10 at batch_size 3750 Thanks in advance
Any new insight on this issue? Is gradient checkpointing in tensorflow.keras 1.14 somehow possible?
I got training to work (see Stack Overflow 53568202, but I cannot load the trained model. When I do, I get the following error:
ValueError: The variables used on recompute were different than the variables originally
used. The function wrapped with @recompute_grad likley creates its own variable
scope with a default name and has been called twice in the same enclosing scope.
To fix, ensure each call to the function happens in its own unique variable
scope.
I think this is because tensorflow is executing in eager mode when using recompute_grad, so the variable scopes aren't being saved. Maybe this can be overcome using EagerVariableStore
.
Does anyone know how to load a model trained using recompute_grad
?
@Sirius083 The aim of this repo isn't to get a certain result, it's just to show how a larger batch size can fit into memory with nothing but engineering - it doesn't even report a validation accuracy or have a test script.
@RayDeeA This should be possible however further engineering might be needed like they do in the RevBlock layer .
https://github.com/IndicoDataSolutions/finetune/blob/development/finetune/base_models/bert/modeling.py#L999 It seems that the recompute_grads implemented in this repo works. Anyone else had applied this code? Hope for any explanation for the code.
Great work! I am using Keras to bulid my model and I want to reduce the memory of densenet. Can this project be directly used on Keras? Could you show some Keras examples or user guide in Keras?