Loss is nan, stopping training

hustvl / Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Apache License 2.0

2.56k stars 160 forks source link

Loss is nan, stopping training #30

Open JunLiangZ opened 4 months ago

JunLiangZ commented 4 months ago

During the training process, the problem of loss being nan occurred. Why is this?

jasscia18 commented 4 months ago

During the training process, the problem of loss being nan occurred. Why is this?

我也出现了这个问题，请问你解决了吗

radarFudan commented 4 months ago

Maybe try Float32 and reduce learning rate, BF16 can suffer from some stability issue.

zhenyuZ-HUST commented 3 months ago

保证--if_amp为False，似乎能解决这个问题。（Try setting --if_amp to False）

sailor-z commented 2 months ago

Hi, if_amp = Falsedoesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

BranStarkkk commented 1 month ago

Hi, if_amp = Falsedoesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

sailor-z commented 1 month ago

Hi, if_amp = Falsedoesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

Not really. It seems all vision mambas have the same problem.

CacatuaAlan commented 1 month ago

set AMP=False may work or just set lower lr

BranStarkkk commented 1 month ago

Hi, if_amp = Falsedoesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

Not really. It seems all vision mambas have the same problem.

I just change the backbone Vim to other vision mamba model, and it works... Its name is VMamba.

sailor-z commented 1 month ago

Hi, if_amp = Falsedoesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?

I also have this problem, have you ever solved the problem?

Not really. It seems all vision mambas have the same problem.

I just change the backbone Vim to other vision mamba model, and it works... Its name is VMamba.

Thanks for the information! I'll look into it.

chuc92man commented 1 month ago

Got same problem, fixed by dividing the sum of forward/backward hidden states by 2 to make hidden states/residuals of all layers have similar magnitude. Check out the detail: https://github.com/hustvl/Vim/pull/90