`recompute_grad` Does Not Work

ghost commented 5 years ago

The method you propose for using recompute_grad is not working, except for the simplest case where all layers in the model are recomputed except the input and output layers. All other cases (e.g. when every-other layer is recomputed) cause the following error:

ValueError: The variables used on recompute were different than the variables originally
used. The function wrapped with @recompute_grad likley creates its own variable
scope with a default name and has been called twice in the same enclosing scope.
To fix, ensure each call to the function happens in its own unique variable
scope.

Can you please advise how to fix this error?

My current method is to (1) create a memory efficient layer, e.g.:

def Conv2D_mem_eff( input_tensor,
                    filters,
                    kernel_size,
                    kernel_regularizer,
                    bias_regularizer,
                    padding,
                    name ):

    with tf.variable_scope( name,
                            use_resource = True ):

        def _x( inner_input_tensor ):

            x = Conv2D( filters = filters,
                        kernel_size = kernel_size,
                        padding = padding,
                        kernel_regularizer = kernel_regularizer,
                        bias_regularizer = bias_regularizer,
                        name = name )(inner_input_tensor)

            return x

        _x = tf.contrib.layers.recompute_grad( _x )

        return _x( input_tensor )

then (2) use this within a Lambda layer when defining my model:

x = Lambda( Conv2D_mem_eff,
                     arguments = {'filters' : 24,
                                            'kernel_size' : (5,5),
                                            'kernel_regularizer' : l2,
                                            'bias_regularizer' : l2,
                                            'padding' : 'same',
                                            'name' : 'conv02'},
                     name= 'conv02' )(x)

I give unique names for each layer I use.

joeyearsley commented 5 years ago

Could you try defining the class outside the function and only use the instance in the function. I.e move Conv2D class instantiation outside of _x func.

And if that’s doesn’t work set reuse=tf.AUTO_REUSE in the variable scope.

ghost commented 5 years ago

@joeyearsley Thanks for your help. So you mean doing this, yes:

def Conv2D_mem_eff( input_tensor,
                    filters,
                    kernel_size,
                    kernel_regularizer,
                    bias_regularizer,
                    padding,
                    name ):

    with tf.variable_scope( name,
                            use_resource = True ):

        lyr_fn = Conv2D( filters = filters,
                                    kernel_size = kernel_size,
                                    padding = padding,
                                    kernel_regularizer = kernel_regularizer,
                                    bias_regularizer = bias_regularizer,
                                    name = name )

        def _x( inner_input_tensor ):

            x = lyr_fn(inner_input_tensor)

            return x

        _x = tf.contrib.layers.recompute_grad( _x )

        return _x( input_tensor )

ghost commented 5 years ago

The first option (instantiating Conv2D outside _x as shown above) gives me the same error:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2639, in get_attr
    c_api.TF_OperationGetAttrValueProto(self._c_op, name, buf)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Operation 'Hidden_Layers/FullyConnectedLayer_01/fc01/fc01/IdentityN' has no attr named '_XlaCompile'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 5652, in get_controller
    yield g
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 398, in _MaybeCompile
    xla_compile = op.get_attr("_XlaCompile")
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2643, in get_attr
    raise ValueError(str(e))
ValueError: Operation 'Hidden_Layers/FullyConnectedLayer_01/fc01/fc01/IdentityN' has no attr named '_XlaCompile'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main_run_0006.py", line 1854, in <module>
    main()
  File "main_run_0006.py", line 1785, in main
    initial_epoch = initial_epoch)
  File "C:\Program Files\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Program Files\Python36\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Program Files\Python36\lib\site-packages\keras\engine\training_generator.py", line 40, in fit_generator
    model._make_train_function()
  File "C:\Program Files\Python36\lib\site-packages\keras\engine\training.py", line 509, in _make_train_function
    loss=self.total_loss)
  File "C:\Program Files\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Program Files\Python36\lib\site-packages\keras\optimizers.py", line 475, in get_updates
    grads = self.get_gradients(loss, params)
  File "C:\Program Files\Python36\lib\site-packages\keras\optimizers.py", line 89, in get_gradients
    grads = K.gradients(loss, params)
  File "C:\Program Files\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2757, in gradients
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 158, in gradients
    unconnected_gradients)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 731, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 403, in _MaybeCompile
    return grad_fn()  # Exit early
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 731, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\custom_gradient.py", line 236, in internal_grad_fn
    return tape_grad_fn(*result_grads)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\custom_gradient.py", line 219, in tape_grad_fn
    input_grads, variable_grads = grad_fn(*result_grads, variables=variables)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 629, in grad_fn
    return _grad_fn(output_grads, kwargs["variables"])
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 622, in _grad_fn
    has_is_recompute_kwarg=has_is_recompute_kwarg)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 553, in _recomputing_grad_fn
    outputs = compute_fn(*inputs, **fn_kwargs)
  File "main_run_0006.py", line 1094, in _x
    x = lyr_fn(inner_input_tensor)
  File "C:\Program Files\Python36\lib\site-packages\keras\engine\base_layer.py", line 474, in __call__
    output_shape = self.compute_output_shape(input_shape)
  File "C:\Program Files\Python36\lib\site-packages\keras\layers\core.py", line 888, in compute_output_shape
    assert input_shape[-1]
AssertionError

ghost commented 5 years ago

Keeping everything the same and adding the argument reuse=tf.AUTO_REUSE to tf.variable_scope gives the original error from the beginning of the post:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2639, in get_attr
    c_api.TF_OperationGetAttrValueProto(self._c_op, name, buf)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Operation 'Hidden_Layers/FullyConnectedLayer_03/fc03/fc03/IdentityN' has no attr named '_XlaCompile'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 5652, in get_controller
    yield g
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 398, in _MaybeCompile
    xla_compile = op.get_attr("_XlaCompile")
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2643, in get_attr
    raise ValueError(str(e))
ValueError: Operation 'Hidden_Layers/FullyConnectedLayer_03/fc03/fc03/IdentityN' has no attr named '_XlaCompile'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main_run_0006.py", line 1850, in <module>
    main()
  File "main_run_0006.py", line 1781, in main
    initial_epoch = initial_epoch)
  File "C:\Program Files\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Program Files\Python36\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Program Files\Python36\lib\site-packages\keras\engine\training_generator.py", line 40, in fit_generator
    model._make_train_function()
  File "C:\Program Files\Python36\lib\site-packages\keras\engine\training.py", line 509, in _make_train_function
    loss=self.total_loss)
  File "C:\Program Files\Python36\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Program Files\Python36\lib\site-packages\keras\optimizers.py", line 475, in get_updates
    grads = self.get_gradients(loss, params)
  File "C:\Program Files\Python36\lib\site-packages\keras\optimizers.py", line 89, in get_gradients
    grads = K.gradients(loss, params)
  File "C:\Program Files\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 2757, in gradients
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 158, in gradients
    unconnected_gradients)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 731, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 403, in _MaybeCompile
    return grad_fn()  # Exit early
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\gradients_util.py", line 731, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\custom_gradient.py", line 236, in internal_grad_fn
    return tape_grad_fn(*result_grads)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\python\ops\custom_gradient.py", line 219, in tape_grad_fn
    input_grads, variable_grads = grad_fn(*result_grads, variables=variables)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 629, in grad_fn
    return _grad_fn(output_grads, kwargs["variables"])
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 622, in _grad_fn
    has_is_recompute_kwarg=has_is_recompute_kwarg)
  File "C:\Program Files\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\rev_block_lib.py", line 556, in _recomputing_grad_fn
    raise ValueError(_WRONG_VARS_ERR)
ValueError: The variables used on recompute were different than the variables originally
used. The function wrapped with @recompute_grad likley creates its own variable
scope with a default name and has been called twice in the same enclosing scope.
To fix, ensure each call to the function happens in its own unique variable
scope.

ghost commented 5 years ago

To my knowledge, albeit limited, I once read that the _XlaCompile error has something to do with input and output tensor shapes not matching.

joeyearsley commented 5 years ago

Can you share your script?

ghost commented 5 years ago

Yes, of course. Can I email it to you? It's rather large, and I'd prefer not to post it directly yet as it's for a class.

ghost commented 5 years ago

@joeyearsley I haven't heard back from you, so I'll assume you want me to post things here. I think part of my error was in trying to wrap the calls in Lambda layers. Lambda layers are inherently stateless, so I think naturally they have no trainable weights, so that was my mistake. In trying to wrap tf.contrib.recompute_grad in a custom Keras layer, I tried this, but it did not work:

class Conv2D_mem_eff(Conv2D):

    def __init__(self,
                 filters,
                 kernel_size,
                 strides=1,
                 padding='valid',
                 data_format=None,
                 dilation_rate=1,
                 activation=None,
                 use_bias=True,
                 kernel_initializer='glorot_uniform',
                 bias_initializer='zeros',
                 kernel_regularizer=None,
                 bias_regularizer=None,
                 activity_regularizer=None,
                 kernel_constraint=None,
                 bias_constraint=None,
                 **kwargs):

        super(Conv2D_mem_eff, self).__init__(
            filters=filters,
            kernel_size=kernel_size,
            strides=strides,
            padding=padding,
            data_format=data_format,
            dilation_rate=dilation_rate,
            activation=activation,
            use_bias=use_bias,
            kernel_initializer=kernel_initializer,
            bias_initializer=bias_initializer,
            kernel_regularizer=kernel_regularizer,
            bias_regularizer=bias_regularizer,
            activity_regularizer=activity_regularizer,
            kernel_constraint=kernel_constraint,
            bias_constraint=bias_constraint,
            **kwargs)

    def call(self, inputs):

        with tf.variable_scope( super(Conv2D_mem_eff, self).name,
                                use_resource = True ):

            def _x( inner_input_tensor ):

                x = super(Conv2D_mem_eff, self).call( inner_input_tensor )

                return x

            _x = tf.contrib.layers.recompute_grad( _x )

            return _x( inputs )

I don't know what I'm doing wrong and how others are getting this to work. Can you help me?

ghost commented 5 years ago

@joeyearsley Do you have a minimal working example I can try to run to get it working? Also, what versions of tensorflow and keras did you use when testing your code? TF 1.9? Keras 2.0? I know I need to use the tensorflow implementations of keras backend (i.e. from tensorflow.python.keras import backend as K) instead of the backed with keras (i.e. from keras import backend as K), but beyond I don't know.

ghost commented 5 years ago

@Sirius083 or @joeyearsley can you help me?

ghost commented 5 years ago

@Sirius083 @joeyearsley I used tensorflow.layers instead of tensorflow.keras.layers to import core layers (e.g. Conv2D and Dense) and I get the warning:

WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.

Is that correct?

ghost commented 5 years ago

CAN SOMEONE PLEASE HELP?!!!

Sirius083 commented 5 years ago

@Sirius083 or @joeyearsley can you help me?

I use another effcient densenet implementation at https://github.com/cybertronai/gradient-checkpointing easy to implement, just add a few lines at the begining of your code.

ghost commented 5 years ago

@Sirius083 That was the first one I tried, but it didn't work. What version of tensorflow and keras did you use?

ghost commented 5 years ago

@Sirius083 Did you see that your gpu memory went down and training time per second went up when you used yaroslav's memory_saving_gradients? Also, are you using Windows or Linux?

Sirius083 commented 5 years ago

@Sirius083 Did you see that your gpu memory went down and training time per second went up when you used yaroslav's memory_saving_gradients? Also, are you using Windows or Linux?

Yes, I cannot train densenet-100-36 without using yaroslav's memory_saving_gradients since I only have one 1080-ti GPU.

It works on both Windows and Linux I checked.

Note: Although tensorflow has an intrinsic implementation of the memory saving method I think, since it will give out CUDA MEMORY ALLODATED FAILED but can still training the model.

However if the model is too big, like densenet-bc-190-40, the model cannot be trained under this method.

ghost commented 5 years ago

@Sirius083 I've tried downgrading from tf-1.8 to 1.5 and still can't get it to work. I'm on Windows 10 and my task manager doesn't show any less memory being utilized when I use memory_saving_gradients.

Right now, I am on tensorflow 1.5 with keras 2.1.6 using python 3.5 x64-bit. I make sure to use the tensorflow implementation of keras backend (from tensorflow.python.keras._impl.keras import backend as K) as well as the tensorflow keras backend modules for keras layers.

I define my model, add gradient checkpointing for several convolutional and fully-connected layers, then compile the model in a function called get_model.

Here is all my code. I haven't put down a bunch of my pandas functions for dataset manipulations, but if for some reason you think they'd be important let me know and I'll post them here. Here is the meat of my code. I don't feel like I'm doing anything too out of the ordinary. Can you take a look?

import tensorflow as tf
from tensorflow.python.keras._impl.keras import backend as K

from tensorflow.contrib.data.python.ops.shuffle_ops import shuffle_and_repeat
from tensorflow.contrib.data.python.ops.batching import map_and_batch

import memory_saving_gradients

Dataset = tf.data.Dataset

from tensorflow.python.keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array
from tensorflow.python.keras.models import Sequential, Model, load_model, model_from_yaml
from tensorflow.python.keras.callbacks import LearningRateScheduler, ModelCheckpoint, EarlyStopping, History, TensorBoard
from tensorflow.python.keras import regularizers, optimizers
from tensorflow.python.keras.layers import Conv2D, Dense, Flatten, Dropout, Input, Lambda, Activation

##################
#GLOBAL VARIABLES
##################

img_shape_raw = (3, 160, 320)

batch_size = 32

num_epochs = 1

crop_top = 70
crop_btm = 25

img_format = 'channels_first'
K.set_image_data_format(img_format)

img_shape_input = (img_shape_raw[0],
                   img_shape_raw[1] - crop_top - crop_btm,
                   img_shape_raw[2]) #(3, 65, 320)

################
#DATA GENERATOR
################

def generator_from_df( df, batch_size, shuffle = True ):

    def read( img_pth, angle ):

        im_fl = tf.read_file( img_pth )
        im = tf.image.decode_image(im_fl, channels=3)
        im = tf.transpose( im, [2, 0, 1] ) # Make image channels first

        return Dataset.from_tensors( (im, angle) )

    img_pths = tf.convert_to_tensor( df['Image_Path'].values )
    angs = tf.convert_to_tensor( df['Angle'].values )

    ds = Dataset.from_tensor_slices( (img_pths, angs) )

    ds = ds.apply( tf.contrib.data.parallel_interleave( read, cycle_length = batch_size, sloppy = True ) )

    if shuffle:
        ds = ds.apply( shuffle_and_repeat( buffer_size = 2*batch_size, count = num_epochs ) )
    else:
        ds = ds.repeat( num_epochs )

    ds = ds.apply( map_and_batch(
        lambda img_pth, ang: (img_pth,ang),
        batch_size,
        num_parallel_batches = max_procs ) )

    ds = ds.prefetch( max_procs )

    iterator = ds.make_one_shot_iterator()
    sess = K.get_session()

    next_element = iterator.get_next()

    while True:

        try:
          yield sess.run(next_element)
        except tf.errors.OutOfRangeError:
          break

###########
#GET MODEL
###########

def get_model( lr ):

    keep_prob = 0.5
    rate = keep_prob

    l2 = regularizers.l2(0.001)

    with tf.name_scope('Input'):
        inputs = Input( shape=img_shape_input, name='input' )

        x = Lambda(lambda x: x / 255. - 0.5,
                   input_shape=img_shape_input, name = 'norm_-0.5_to_0.5')(inputs)

    with tf.name_scope('Hidden_Layers'):

        with K.name_scope('ConvLayer_01'):

            x = Conv2D(4, (5,5),
                       kernel_regularizer=l2,
                       bias_regularizer=l2,
                       padding='same',
                       name='conv01')(x)

        with tf.name_scope('ConvLayer_02'):

            x = Conv2D(12, (5,5),
                       kernel_regularizer=l2,
                       bias_regularizer=l2,
                       padding='same',
                       name='conv02')(x)

        with tf.name_scope('ConvLayer_03'):

            x = Conv2D(24, (5,5),
                       kernel_regularizer=l2,
                       bias_regularizer=l2,
                       padding='same',
                       name='conv03')(x)

        with tf.name_scope('ConvLayer_04'):

            x = Conv2D(24, (3,3),
                       kernel_regularizer=l2,
                       bias_regularizer=l2,
                       padding='same',
                       name='conv04')(x)

        with tf.name_scope('ConvLayer_05'):

            x = Conv2D(32, (3,3),
                       kernel_regularizer=l2,
                       bias_regularizer=l2,
                       padding='same',
                       name='conv05')(x)

        with tf.name_scope('Flatten'):

            x = Flatten(name='flatten')(x)

        with tf.name_scope('FullyConnectedLayer_01'):

            x = Dense(100,
                      kernel_regularizer=l2,
                      bias_regularizer=l2,
                      name='fc01')(x)

        with tf.name_scope('FullyConnectedLayer_02'):

            x = Dense(50,
                      kernel_regularizer=l2,
                      bias_regularizer=l2,
                      name='fc02')(x)

        with tf.name_scope('FullyConnectedLayer_03'):

            x = Dense(25,
                      kernel_regularizer=l2,
                      bias_regularizer=l2,
                      name='fc03')(x)

        with tf.name_scope('FullyConnectedLayer_04'):

            x = Dense(10,
                      kernel_regularizer=l2,
                      bias_regularizer=l2,
                      name='fc04')(x)

    with tf.name_scope('Output'):

        outputs = Dense(1,
                        name='output')(x)

    # Create Model

    model = Model( inputs = inputs, outputs = outputs )

    adam = optimizers.Adam( lr = lr, decay = 0.001 ) # Learning rate and decay set in LearningRateScheduler

    # Memory Saving Gradients

    layer_names = [ 'conv02', 'conv04', 'fc01', 'fc03' ]

    [tf.add_to_collection('checkpoints', model.get_layer(l).get_output_at(0))
     for l in layer_names]

    K.__dict__['gradients'] = memory_saving_gradients.gradients_collection

    # Compile Model

    model.compile(loss='mean_squared_error', optimizer=adam, metrics=['mse'])

    return model

class CumulativeHistory( History ):
    '''
    History does not allow resume history, but this does.
    '''
    def on_train_begin( self, logs=None ):
        if not hasattr(self, 'epoch'):
            super(CumulativeHistory, self).on_train_begin( logs )

def main(*args, **kargs):
    """ Behavioral Cloning Project
    """

    parser = argparse.ArgumentParser(description='Behavioral Cloning Project')

    parser.add_argument('-c', '--checkpoint', type=str, help='Checkpoint (`.h5` file)')
    parser.add_argument('-e', '--epoch', type=int, help='Initial epoch')

    args = parser.parse_args()

    model_type = 'new'
    train_model = None
    initial_epoch = 0

    if args.checkpoint is not None:

        train_model = load_model( args.checkpoint )

        initial_epoch = args.epoch

        model_type = 'loaded'

    # Set Configuration

    config = tf.ConfigProto( intra_op_parallelism_threads = max_procs,
                             inter_op_parallelism_threads = 0) # set automatically to number of logical cores

    config.gpu_options.allow_growth = True

    # Get Data

    df_train, df_val, df_test, bins = get_data( keep_ptl = 60 )

    ntrain, nval, ntest = df_train.shape[0], df_val.shape[0], df_test.shape[0]

    # Training

    train_graph = tf.Graph()

    train_generator = generator_from_df( df_train, batch_size )
    val_generator   = generator_from_df( df_val,   batch_size, shuffle=False )

    nbatches_train = ntrain // batch_size
    nbatches_val   = nval // batch_size

    history = CumulativeHistory()

    early_stop = EarlyStopping( monitor='val_mean_squared_error',
                                min_delta=1e-4,
                                patience=50,
                                verbose=0,
                                mode='min')

    model_ckpt = ModelCheckpoint( fl_fmt_wt_ckpt,
                                  monitor='val_mean_squared_error',
                                  verbose=0,
                                  save_best_only=True,
                                  save_weights_only=True,
                                  period=1)

    callbacks = [history, early_stop, model_ckpt]

    for i in range(len(lr)):

        train_sess = tf.Session( config = config, graph = train_graph )
        K.set_session( train_sess )

        if model_type == 'new':

            with train_graph.as_default():

                # Print model summary
                summary_fl_pth = os.path.join( fldr_summary, 'model_summary_run_{:04d}_'.format(run[0]) + r'.txt' )

                train_model = get_model( lr[i], is_training = True )

                with open(summary_fl_pth, 'w') as summary_file:
                    train_model.summary( print_fn=lambda x: summary_file.write(x + '\n') )

        with train_graph.as_default():

            with train_sess.as_default():

                if K.backend() == 'tensorflow':

                    board = TensorBoard( log_dir = fldr_log,
                                         histogram_freq = 0,
                                         write_graph = True,
                                         write_images = True )
                    callbacks.append( board )

                writer = tf.summary.FileWriter( fldr_log, train_graph )

                ts = time.time()
                ts = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d_%H-%M-%S')

                arch_yaml = train_model.to_yaml()
                arch_fl_pth = os.path.join( fldr_arch, 'arch_' + hparam_str[0] + '_run_{:04d}_'.format(run[0]) + ts + '.yaml' )

                with open(arch_fl_pth, 'w') as arch_file:
                    arch_file.write( arch_yaml )

                train_model.save( os.path.join( fldr_mdl,
                                                'model_init_' + hparam_str[0] + '_run_{:04d}_'.format(run[0]) + ts + '.h5') )

                train_model.save_weights( os.path.join( fldr_wt,
                                                        'weights_init_' + hparam_str[0] + '_run_{:04d}_'.format(run[0]) + ts  + '.h5' ) )

                train_model.fit_generator(
                    generator = train_generator,
                    steps_per_epoch = nbatches_train,
                    epochs = num_epochs,
                    max_queue_size = max_q_size,
                    validation_data = val_generator,
                    validation_steps = nbatches_val,
                    workers = 0,
                    callbacks = callbacks,
                    initial_epoch = initial_epoch)

                ts = time.time()
                ts = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d_%H-%M-%S')

                train_model.save( os.path.join( fldr_mdl,
                                                'model_final_' + hparam_str[0] + '_run_{:04d}_'.format(run[0]) + ts + '.h5') )

                train_model.save_weights( os.path.join( fldr_wt,
                                                        'weights_final_' + hparam_str[0] + '_run_{:04d}_'.format(run[0]) + ts  + '.h5' ) )

        if K.backend() == 'tensorflow':
            K.clear_session()

        del train_model
        gc.collect()

if __name__ == '__main__':
    """ Entry point to the program
    """

    main()

Sirius083 commented 5 years ago

@gitrdonator sorry I did not use keras, I use tensorflow(1.9.0 on windows) and python 3.6 I just add these lines before the model defination and training part.

I think the problem may be you should first import tensorflow , than overwrite the gradient_memory function as below

import sys
import os

import numpy as np
import tensorflow as tf

import memory_saving_gradients
from tensorflow.python.ops import gradients
def gradients_memory(ys, xs, grad_ys=None, **kwargs):
    return memory_saving_gradients.gradients(ys, xs, grad_ys, checkpoints='memory', **kwargs)
gradients.__dict__["gradients"] = gradients_memory 

import argparse
import os
import math

from tensorpack import *
from tensorpack.tfutils.symbolic_functions import *
from tensorpack.tfutils.summary import *

ghost commented 5 years ago

@Sirius083 I tried that before as well, but that didn't work. However, I tried just now using your method with gradients checkpointing to see if it would work, but it still didn't. I modified my code as follows:

In imports, ensure memory_saving_gradients comes after tensorflow, which it did before, but add from tensorflow.python.ops import gradients after memory_saving gradients:

...
import memory_saving_gradients
from tensorflow.python.ops import gradients
...

Then modify the end of my get_model function as follows:

...
    layer_names = [ 'conv02', 'conv04', 'fc01', 'fc03' ]

    [tf.add_to_collection('checkpoints', model.get_layer(l).get_output_at(0))
     for l in layer_names]

    def gradients_collection(ys, xs, grad_ys=None, **kwargs):
        return memory_saving_gradients.gradients(ys, xs, grad_ys, checkpoints='collection', **kwargs)

    gradients.__dict__["gradients"] = gradients_collection
...

but this still didn't work.

Using gradients_memory normally with the keras backend (K.dict) instead of the tensorflow ops gradients (gradients.dict) does tell me it can't find a bottleneck and I should used checkpointing, whereas it does not with gradients.dict.

Do you by chance have any other ideas?

ghost commented 5 years ago

@joeyearsley Since this does not work, particularly with Keras as far as I can tell, can you please update your README.md to state that this does not work with keras

ghost commented 5 years ago

@Sirius083 Can you share your tensorflow code with me? I desparately need to get memory saving to work in Windows, and I can't get it to work using keras.

Sirius083 commented 5 years ago

@gitrdonator I just add the few lines before in this code in cifar10-densenet.py https://github.com/YixuanLi/densenet-tensorflow I did not perticular add anything else , which means I add gradient checkpoiting to all the convolutional layer, not just a few specified layers may be you can open a issue under this repository https://github.com/cybertronai/gradient-checkpointing

ghost commented 5 years ago

@Sirius083 I've opened many. You're more than welcome to look.

ghost commented 5 years ago

@Sirius083 So, to be clear, what you're telling me is that you don't in fact have any working code?

ghost commented 5 years ago

@joeyearsley @Sirius083 Can you please help me to get memory_saving_gradients in tensorflow-gpu 1.5 working? You had said previously that you got it to work in tensorflow 1.5.

Would you mind please taking a look at Issue #42 I created on cybertonai/gradient-checkpointing? Thank you.

Sirius083 commented 5 years ago

@gitrdonator I said it works on tensorflow 1.9 on windows , I never tried it on tensorflow 1.5.

ghost commented 5 years ago

@Sirius083 Sorry, that was for @joeyearsley. He had said he had it "successfully working" in an issue post on cybertronai/gradient-checkpointing.

ghost commented 5 years ago

@Sirius083 But while I have you here, would you mind taking a look at the issue and letting me know if you see anything that I could do?

ghost commented 5 years ago

@Sirius083 You did not even import memory_saving_gradients in your code. In fact, even if you did get it to work, you should let @joeyearsley know. He opened a giant issue with many others showing that they couldn't get it to work past tensorflow 1.8 (see (Issue #29).

joeyearsley / efficient_densenet_tensorflow

`recompute_grad` Does Not Work #5