yandexdataschool / Practical_RL

A course in reinforcement learning in the wild
The Unlicense
5.94k stars 1.7k forks source link

coursera/week6/seq2seq/basic_model_tf.py hangs with TF 1.14.0 #256

Open dniku opened 5 years ago

dniku commented 5 years ago

https://www.coursera.org/learn/practical-rl/discussions/all/threads/b4Bm1b6OEemlhhJkLrq7mA reports that the honor track assignment hangs on Colab with the current TF version (1.14.0), but works with an old one (1.6.0). I have successfully reproduced the issue.

The culprit is somewhere in basic_model_tf.py, which is also present almost unmodified in master (week07_seq2seq). Most likely, we haven't noticed that because no one has attempted that week with TF instead of PyTorch. In any case, this must be fixed one way (finding the cause of the issue) or another (getting rid of TF in master and coursera).

It seems that the issue is caused by the invocation of dynamic_rnn, which hangs (I've added a couple of debug prints and the last one that fired was the one before the call to that function). dynamic_rnn is deprecated; this SO thread is probably relevant for migration: https://stackoverflow.com/questions/54989442/rnn-in-tensorflow-vs-keras-depreciation-of-tf-nn-dynamic-rnn

dniku commented 5 years ago

I have verified that it works with the current version of the justheuristic/practical_rl Docker image, which contains Tensorflow 1.13.1.

dniku commented 4 years ago

Confirmed that !pip install tensorflow-gpu==1.13.1 instead of %tensorflow_version 1.x in the Colab init cell fixes the issue in Colab.

dniku commented 4 years ago

It still doesn't work, although we're now installing an old version in Colab.