Closed Rysess closed 2 years ago
Hi, how did you generate those graphs?
Hi, I generated those graphs by parsing the output of the training. For the gradient norm i followed this topic : https://discuss.pytorch.org/t/check-the-norm-of-gradients/27961/5 and simply add it to the debug print. I tried without in case it caused instability but the same problem appears.
Hi bgaro, the pretrained weights are cached locally after downloading them for the first time. The fact that you don't see a downloading bar does not mean the weights are not applied at the beginning of the training. It should load, as long as VisionEncoderDecoderModel.from_pretrained(paths.trocr_repo)
in util.py
is executed.
Now regarding your training issue:
5e-6
first, going even further down if needed. The setting is currently in scripts.py
line 53, I might commit an update later that moves the constant to the constants.py
fileLet me know if that helps!
Hi, i managed to solve the issue thanks to your help. To answer your questions :
TLDR : Reducing the LR to 5e-6
did solve the issue
Hi, i have problem with the training of the model. Indeed the gradient seems to explode frequently but not at every training. Here is a graph that represents this problem.
I've tried to print the prediction of the model at each validation step but when the gradient explode the model keeps predicting empty labels. I'm using a portion of the IAM dataset and my labels are structured this way : file-name.png,¤label¤ I'm using the character '¤' since it does not appear in the dataset and so i can predict double quotes (I've modified the csv reader to take this character to mark out the label). I've tried to force the download of the pretrained weights at the beginning of each training without effect. I've also tried to increase the word len without any effect too. I'm surely missing something but can't see what.
Do you have any idea what could cause the model to run this way ? Thanks