Closed NiranthS closed 3 years ago
Hi NiranthS,
I am not sure. Can you print debugging messages around line 644 to accurately locate this? Line 644 seems normal to me and shouldn't cause problems.
Sorry, it was line 635 that has sess.run. Also, the problem was on my side. The conversion of datasets into tfrecords was not done properly. Now it is running.
But, the loss is increasing and going to nan eventually. Any ideas what might be causing this?
This might because the learning rate is too large. Try reducing the learning rate to 1/10 of original.
Tried it nut there was no difference by changing the learning rate. Will try to change other parameters. Thank You.
I tried to run the training script, the terminal shows "Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8694 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, compute capability: 7.5)" and gets stuck at line 644 in trainer.py(found using pdb)