Open suzy0223 opened 1 year ago
Besides, add 'tf.compat.disable_eager_execution()' at the beginning of the def placeholder(h).
I met the same problem that the loss became 0 after several epochs, could you help me, appeciated!
Hi, I try to run the train.py on METR-LA. Due to the TensorFlow version, I use the tf_upgrad_v2 to migrate the model.py to TF 2. X. Specifically, (1) line 76 'tf.nn.rnn_cell.GRUCell' to 'tf.compat.v1.nn.rnn_cell.GRUCell' (2)line 80 and 87 'tf.layers.dense' to 'tf.compat.v1.layers.dense'
Then, when I run train.py and test.py, there're several issues, as follows: (1) line 47 in test.py and line 58 in train.py, x.value() error, x is int. After I change the "x.value for x in xxx" to "x for x in xxx" it could work (2) After 4-5 epochs, the training and validation losses become 0. The test result becomes nan. I run the code several times, the issue does not disappear. Meanwhile, the test.py is normal and outputs the results. Since the dataset doesn't contain the metr-la.h5, I use the document download from IGNNK.
Now, I am not sure of the reasons for the issues. Hope to here some suggestions. Much appreciated.
I translated their code into PyTorch. I also encountered the same issue you mentioned. And I think the problem is that they didn't normalize the inputs (so that masking NaN values in the loss function would not be difficult). However, it is causing the gradient to explode after 4 or 5 epochs.
When I got the learning rate down to 0.0001, it worked fine, but the results were not as good as in the paper
When I got the learning rate down to 0.0001, it worked fine, but the results were not as good as in the paper
Hello, may I ask if you have solved this problem now? Can you run the effect in the paper?
Hi, I try to run the train.py on METR-LA. Due to the TensorFlow version, I use the tf_upgrad_v2 to migrate the model.py to TF 2. X. Specifically, (1) line 76 'tf.nn.rnn_cell.GRUCell' to 'tf.compat.v1.nn.rnn_cell.GRUCell' (2)line 80 and 87 'tf.layers.dense' to 'tf.compat.v1.layers.dense'
Then, when I run train.py and test.py, there're several issues, as follows: (1) line 47 in test.py and line 58 in train.py, x.value() error, x is int. After I change the "x.value for x in xxx" to "x for x in xxx" it could work (2) After 4-5 epochs, the training and validation losses become 0. The test result becomes nan. I run the code several times, the issue does not disappear. Meanwhile, the test.py is normal and outputs the results. Since the dataset doesn't contain the metr-la.h5, I use the document download from IGNNK.
Now, I am not sure of the reasons for the issues. Hope to here some suggestions. Much appreciated.