raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
https://dynamicvit.ivg-research.xyz/
MIT License
551 stars 69 forks source link

Loss is nan when training my own dataset #6

Closed InfinityBox closed 2 years ago

InfinityBox commented 2 years ago

It would happen randomly in any epoch. And it still appears after setting a lower learning rate or turning off the amp.

Then I found it usually occurs in module 'PredictorLG', if 'policy' is an all-zero matrix, there will be nan in global_x.

Is it ok to add a very small value in the denominator of the global_x, e.g. 1e-6?

raoyongming commented 2 years ago

I think it is ok to add a small value in this line. Since the following layers are fully learnable, I think it may also be ok to add any positive constant to the denominator of global_x (e.g., 1) to avoid nan values.

InfinityBox commented 2 years ago

Thank you so much for your quick response!