microsoft / EdgeML

This repository provides code for machine learning algorithms for edge devices developed at Microsoft Research India.
Other
1.59k stars 370 forks source link

FastTrainer : How to stack more FastCells #82

Closed praneet195 closed 3 years ago

praneet195 commented 5 years ago

Is there any way of providing more cells to the FastCellTrainer or does it only work with one cell?

adityakusupati commented 5 years ago

Sorry for the later reply. You can replace any of the stacked LSTMCells or GRUCells with FastGRNN or FastRNN in your codes. Unless you want to induce sparsity, you wouldn't need to use the FastCell Trainer. Also, it is easy to induce sparsity even with stacked RNN cells in the same fashion as FastCellTrainer does it right now. Feel free to ask your queries here if there are any with examples so that I can give you a precise answer.

praneet195 commented 5 years ago

Hi, I'm unable to add these cells in a fashion similar to the say LSTMCells. For example considering the FastGRNN cell example where the cell has been defined, is there any way to return its output to another cell and hence stack them ?

adityakusupati commented 5 years ago

Is this what you want - https://github.com/Microsoft/EdgeML/blob/master/tf/edgeml/trainer/fastTrainer.py#L73 ?

Also, if you have a specific code in use, you can share the snippet in gist and I can look at it and suggest you how to change it. I easily stack FastCells as easily as LSTM Cell.

praneet195 commented 5 years ago

Yes thank you, this is exactly what I needed.. is there a code snippet that you can provide stacking multiple cells as I'm running into a dimensionality issue while trying to send the output of one cell to another

adityakusupati commented 5 years ago

Hi Praneet,

I would rather encourage you to share your case, so that I can help you. According to me, using https://www.tensorflow.org/api_docs/python/tf/nn/rnn_cell/MultiRNNCell is very straight forward for FastGRNN if your code works for GRU.

praneet195 commented 5 years ago

Alright, This is my case: I've created this model in keras using RNN and the LSTMCell. The model is as follows:

design network

                    cell1=tf.nn.rnn_cell.LSTMCell(64)
                    cell2=tf.nn.rnn_cell.LSTMCell(64)
                    cell3=tf.nn.rnn_cell.LSTMCell(64)
                    cell4=tf.nn.rnn_cell.LSTMCell(64)

                    model = Sequential()
                    model.add(RNN(cell1, input_shape=(train_X.shape[1:]),return_sequences=True))
                    model.add(BatchNormalization())
                    model.add(Dropout(0.5))

                    model.add(RNN(cell2,  return_sequences=True))
                    model.add(BatchNormalization())
                    model.add(Dropout(0.5))

                    model.add(RNN(cell3,  return_sequences=True))
                    model.add(BatchNormalization())
                    model.add(Dropout(0.5))

                    model.add(RNN(cell4, return_sequences=False))
                    model.add(BatchNormalization())
                    model.add(Dropout(0.5))

                    model.add(Dense(128, activation='relu'))
                    model.add(BatchNormalization())
                    model.add(Dropout(0.5))

                    model.add(Dense(1, activation='sigmoid'))

                    opt = tf.keras.optimizers.Adam(lr=1e-3, decay=1e-5)

                    model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])

                    history = model.fit(train_X, train_y, epochs=10, batch_size=1024, validation_data=(test_X, test_y), verbose=1, shuffle=True)

However, if I substitute the LSTMCell in the above case with say the FastRNNCell or FastGRNNCell, i get the following error:

error

Traceback (most recent call last):
  File "fasttrainer.py", line 72, in <module>
    model.add(RNN(cell1, input_shape=(train_X.shape[1:]),return_sequences=True))
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\training\checkpointable\base.py", line 474, in _method_wrapper
    method(self, *args, **kwargs)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\sequential.py", line 159, in add
    layer(x)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\keras\layers\recurrent.py", line 619, in __call__
    return super(RNN, self).__call__(inputs, **kwargs)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 757, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\keras\layers\recurrent.py", line 750, in call
    input_length=timesteps)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\keras\backend.py", line 3292, in rnn
    swap_memory=True)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3291, in while_loop
    return_same_structure)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3004, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2939, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3260, in <lambda>
    body = lambda i, lv: (i + 1, orig_body(*lv))
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\keras\backend.py", line 3277, in _step
    tuple(states) + tuple(constants))
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\keras\layers\recurrent.py", line 737, in step
    output, new_states = self.cell.call(inputs, states, **kwargs)
  File "C:\Users\prane\Desktop\keras-lstm-master\edgeml\graph\rnn.py", line 291, in call
    initializer=W_matrix_init)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1487, in get_variable
    aggregation=aggregation)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1237, in get_variable
    aggregation=aggregation)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 540, in get_variable
    aggregation=aggregation)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 492, in _true_getter
    aggregation=aggregation)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 861, in _get_single_variable
    name, "".join(traceback.format_list(tb))))
ValueError: Variable FastRNN/FastRNNcell/W already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
    op_def=op_def)
  File "d:\Users\prane\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)

It says that the FastRNNCell already exists, disallowed as seen above. Any reason why ?

adityakusupati commented 5 years ago

Hi Praneet,

It is evident from the logs that the scope is the issue. Try using this in the declaration and let me know if it fails. Also, do you expect all the cells to have different weights or do you expect them to be coupled?

cell1=FastRNNCell(64, name="FastGRNNCell1") cell2=FastRNNCell(64, name="FastGRNNCell2") cell3=FastRNNCell(64, name="FastGRNNCell3") cell4=FastRNNCell(64, name="FastGRNNCell4")

praneet195 commented 5 years ago

Hi Aditya, Firstly, thank you for the prompt replies. I'm getting the following error after trying your fix.

error

  ValueError: Variable FastGRNNCell1/FastRNNcell/W already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? 

Not sure why the scope issue is arising. Also, I expect them to have different weights.

adityakusupati commented 5 years ago

@praneet195 , I have tested your code of LSTMs and it should fail with the same error as you shared in tensorflow unless you have been using a different version of tf. It seems to run fine in keras for some reason.

cell1=FastGRNNCell(64, name="FastGRNNCell1") cell2=FastGRNNCell(64, name="FastGRNNCell2") cell3=FastGRNNCell(64, name="FastGRNNCell3") cell4=FastGRNNCell(64, name="FastGRNNCell4")

should fix the issue (it is same as the earlier stuff) in tensorflow, unless there is something which isn't shared with me yet. I need to look back on my keras codes when I codes FastGRNN for Keras and see what is the difference

I think it is better if you could mail your entire code and any test data to me . I don't understand why it is failing even after scope resolution. I have written multiple test scripts in tf to test this case and I saw full resolution once clearly dis-ambiguated.

Update: Keras seems to be doing something weird in my test scripts. It isn't failing where yours is. It is failing for some internal change of shape (which isn't happening) but keras thinks it is happening. I think we better take tis offline with actual working code and data and then debug.

praneet195 commented 5 years ago

Sure, I will drop a mail on your official id and let's take this offline.

khlus commented 4 years ago

I have similar problem as it is described above. To make it work in Keras, i did following changes:

In original class FastGRNNCell(RNNCell) i changed declaration and first line in function call() to following

def call(self, inputs, states):
         state = states[0]

Also I wrapped this class to Keras layer.

class FastGRNNCellWrapper(Layer):
  def __init__(self, units, **kwargs):
      self.units = units
      self.state_size = units
      super(FastGRNNCellWrapper, self).__init__(**kwargs)
      self.lstm = FastGRNNCell(units, **kwargs)

  def call(self, inputs, states, training=None):
    return self.lstm(inputs, states)

Than it start to work, but result is really bad. It will be great if somebody can share implementation in Keras.

adityakusupati commented 4 years ago

@praneet195 If I remember correctly, we were able to debug this offline. Can you please share the code in that case?

khlus commented 4 years ago

Maybe my model also will be useful. To reproduce the issue, just use FastGRNNCell instead of FastGRNNCellWrapper.

    timesteps = 50
    data_dim = 128
    num_classes = 1

    model = Sequential()

    # 1D ConvNet
    model.add(Reshape((timesteps, data_dim, 1), input_shape=(timesteps, data_dim,)))
    model.add(TimeDistributed(Conv1D(filters=data_dim,  kernel_size=32, padding='same', name='conv1')))

    model.add(TimeDistributed(BatchNormalization()))
    model.add(TimeDistributed(Activation('relu')))
    model.add(TimeDistributed(MaxPooling1D(pool_size=data_dim - 32 + 1)))
    model.add(TimeDistributed(Dropout(0.3)))

    # RNN-LSTM
    model.add(Reshape((timesteps, data_dim,)))
    model.add(RNN(FastGRNNCellWrapper(data_dim, name='lstm1'), name='lstm1', return_sequences=True))
    model.add(Dropout(rate=0.3, name='drop1'))
    model.add(RNN(FastGRNNCellWrapper(data_dim, name='lstm2'), name='lstm2', return_sequences=True))
    model.add(Dropout(rate=0.3, name='drop2'))

    # Fully connected layer
    model.add(TimeDistributed(Dense(data_dim, activation='linear', kernel_initializer='VarianceScaling', name='fc3')))
    model.add(TimeDistributed(BatchNormalization()))
    model.add(TimeDistributed(Activation('relu', name='relu3')))
    model.add(TimeDistributed(Dropout(rate=0.3, name='drop3')))

    # Output layer
    model.add(TimeDistributed(Dense(num_classes, activation='sigmoid', name='output')))
shipleyxie commented 4 years ago

lWrapp I have test your code, but the weights in FastGRNN have not been added to model.trainable_weights It seems the params is not trainable when use FastGRNNCellWrapper directly in Keras. Do you have this problem? @DmitryKhlus

shipleyxie commented 4 years ago

I have similar problem as it is described above. To make it work in Keras, i did following changes:

In original class FastGRNNCell(RNNCell) i changed declaration and first line in function call() to following

def call(self, inputs, states):
         state = states[0]

Also I wrapped this class to Keras layer.

class FastGRNNCellWrapper(Layer):
  def __init__(self, units, **kwargs):
      self.units = units
      self.state_size = units
      super(FastGRNNCellWrapper, self).__init__(**kwargs)
      self.lstm = FastGRNNCell(units, **kwargs)

  def call(self, inputs, states, training=None):
    return self.lstm(inputs, states)

Than it start to work, but result is really bad. It will be great if somebody can share implementation in Keras.

The reason why result is that bad maybe is the we have not add these weights in FastGRNN int Keras model and make it a trainable weights. In my experiment ,I can't find FastGRNN's weights in model.trainable_weithts

adityakusupati commented 4 years ago

@dileep3004 , I have very little clue about the latest Keras versions and I do not have the setup to reproduce this. A simple thing to do is to check if the same error persists with other Cells like GRULRCell etc., The error seems weird and my Keras implementations are very old. May be @praneet195 or @shipleyxie can look at this as they seem to have figured out the keras stuff :).

@DmitryKhlus sorry I lost track of this issue because it shows as closed. As mentioned about I have very little clue about keras now a days.

adityakusupati commented 4 years ago

You need to change the cells as well. I don't know what RNN(cell) or LSTM(cell) does in keras. At least in the current code, it seems like you are creating an LSTMLRCell and passing it to RNN().

Maybe pointing me to what RNN() or LSTM() do, might help. But when I asked to changed the cells, it is about changing cell1, cell2, cell3.

Also, please remember we haven't written this code with TF 2.x in mind, so I don't know any compatibility issues with that. But, from what I can see the code you just posted will not work due to multiple reasons which are not even related to the internals of EdgeML

dileep3004 commented 4 years ago

Okk . I will debug it.

Can you point me to some resources to include FastCells in Keras Models.

adityakusupati commented 4 years ago

This thread contains some information. FastCells are exactly like GRUCell or LSTMCell of native Tensorflow. The incorporation of FastCells follows almost the same path as the other two.

Sorry, that I couldn't be of more help. You can always use TF 1.x and use the fastcell_example as a starting point and then use tf.MultiCellRNN (not sure about syntax). to get multiple layers of FastCells.

yunishi3 commented 4 years ago

I just modified to FastGRNN/FastRNN cell for Keras implementation from original code. https://github.com/yunishi3/FastGRNN-for-Keras

adityakusupati commented 4 years ago

@yunishi3 thanks a ton. I will go through the implementation just to be safe.

I think your implementation doesn't support training to induce sparsity. Can you please add that to readme as well. The functional implementation of the cells is exactly the tf code, so nothing to worry there :)

@harsha-simhadri do we want to have keras cells in EdgeML? If so, this can be a starting point. I have implemented keras cells in the older static graph versions and I am not sure if that makes sense now.

@yunishi3 can you also check stacking of multiple of these cells? There was an error in scope the last time I checked it in Keras.

yunishi3 commented 4 years ago

@adityakusupati Thank you for checking my repository. I just added the note in terms of sparsity inducing in the README.md.

As far as stacking of multiple of these cells, following code worked in the fastcell_example_keras.ipynb.
(Sorry, I forgot this thread was about stacking cells...)

FastCell = FastGRNNCellKeras(hiddenDims)
FastCell_1 = FastGRNNCellKeras(hiddenDims)
FastCell_2 = FastGRNNCellKeras(hiddenDims)
~~
x = RNN(FastCell, return_sequences=True, name='rnn')(x)
x = RNN(FastCell_1, return_sequences=True, name='rnn1')(x)
x = RNN(FastCell_2, return_sequences=False, name='rnn2')(x)
adityakusupati commented 4 years ago

Thanks @yunishi3 . This is very helpful.