tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.21k stars 1.77k forks source link

Resnet + ConvLSTM on TPU #576

Open vijayvee opened 5 years ago

vijayvee commented 5 years ago

I am trying to build a ResNet model with a Convolutional LSTM between successive layers. I tile my input tensor timesteps number of times and reshape the input tensor to [n,t,h,w,c] to pass as the input to tf.nn.dynamic_rnn().

I have experimented with https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/contrib/rnn/ConvLSTMCell, https://www.tensorflow.org/api_docs/python/tf/keras/layers/ConvLSTM2D, and https://github.com/carlthome/tensorflow-convlstm-cell, all of which seem to throw errors like tensorflow.python.framework.errors_impl.InvalidArgumentError: 2112 nodes in a cycle.

Has anyone else come across this issue? How do we solve this?

Additional info: The error is thrown only when the hidden state from one timestep is passed on to the next timestep. When I pass [c[t], h[0]] to the next timestep always instead of [c[t], h[t]], the error seems to not be thrown. Thanks!

sailordiary commented 4 years ago

Yes, it turns out tf.nn.dynamic_rnn still doesn't really work with TPUs. I changed it into tf.contrib.recurrent.functional_rnn which is designed for TPU compatibility, and the problem goes away. I don't understand why that is -- perhaps we should file a bug report?