kdexd / virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
http://kdexd.xyz/virtex
MIT License
557 stars 61 forks source link

No loss when pretraining on token classification #28

Closed alexkern1997 closed 2 years ago

alexkern1997 commented 2 years ago

I am trying to pretrain using the token classification method. I copied this repo and was just trying to reproduce the results from the study. I am experiencing problems when pretraining using token classification. It seems as though the loss values are not in the output_dict variable.

When I use pretrain_virtex.py and log every 20 iterations, I get the following output. 2021-11-16T12:20:04.960052+0000: Iter 20 | Time: 0.764 sec | ETA: 54h 39m [Loss nan] [GPU 8774 MB]

Do you have any idea what could be wrong in the code?

kdexd commented 2 years ago

Hi @alexkern1997 , thank you for trying the code! The returned dictionary from TokenClassification.forward() method should always have keys named loss and loss_components. This line is not guarded by any conditional blocks. The code would raise an error if the keys are missing instead of printing nan. Are you using the COCO dataset? I would recommend sanity checking the input tensors to the model. Let me know if I missed anything.

alexkern1997 commented 2 years ago

Hey @kdexd! Thanks for the response!

Indeed, I am not using the COCO dataset, which seemed to be the issue. My dataset contained a single caption per image, instead of 5 captions per image (which is the case in the COCO dataset). The dataset objects in the repo expect a list of captions instead of just a single caption, resulting in the model selecting a random character instead of a random caption (due to this line). If the random character was not in the set of tokens, the model was only presented the , and token, which I believe resulted in the nan being printed. Changing the line above to caption=captions fixed this issue.

kdexd commented 2 years ago

Glad that it works! I am closing this issue, feel free to reopen if you have further questions!