Open JunLiangZ opened 4 months ago
During the training process, the problem of loss being nan occurred. Why is this?
我也出现了这个问题,请问你解决了吗
Maybe try Float32 and reduce learning rate, BF16 can suffer from some stability issue.
保证--if_amp为False,似乎能解决这个问题。(Try setting --if_amp to False)
Hi,
if_amp = False
doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?
Hi,
if_amp = False
doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?
I also have this problem, have you ever solved the problem?
Hi,
if_amp = False
doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?I also have this problem, have you ever solved the problem?
Not really. It seems all vision mambas have the same problem.
set AMP=False may work or just set lower lr
Hi,
if_amp = False
doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?I also have this problem, have you ever solved the problem?
Not really. It seems all vision mambas have the same problem.
I just change the backbone Vim to other vision mamba model, and it works... Its name is VMamba.
Hi,
if_amp = False
doesn't work for me. I also tried using a small learning rate, but the problem still exists. Does anyone know how to handle it?I also have this problem, have you ever solved the problem?
Not really. It seems all vision mambas have the same problem.
I just change the backbone Vim to other vision mamba model, and it works... Its name is VMamba.
Thanks for the information! I'll look into it.
Got same problem, fixed by dividing the sum of forward/backward hidden states by 2 to make hidden states/residuals of all layers have similar magnitude. Check out the detail: https://github.com/hustvl/Vim/pull/90
During the training process, the problem of loss being nan occurred. Why is this?