state-spaces / mamba

Mamba SSM architecture
Apache License 2.0
13.2k stars 1.12k forks source link

mamba-ssm NO GRADIENT during training #569

Open xiaxiaoguang opened 1 month ago

xiaxiaoguang commented 1 month ago

I tried training Mamba models on NVIDIA GeForce GTX TITAN X whose CUDA Version is 12.4,but failed to update the parameters.

My torch version is '1.13.1+cu117'.

Because of the internet problem(The server cannot connect github), I cannot use 'pip install / conda install' to install mamba-ssm.

So I cloned the repositories and downloaded the corresponding .whl files from another computer, and upload it to my server.

Then I only modified the setup.py files to make it use local url instead of visiting github ,specificly i only change the value of 'BASE_WHEEL_URL'.

After that it successfully installed , then i tried training it. I find the loss miraculous increasing during the training.

SO I print the gradient after the optim.step(), I got great results:

image

As you can see, the gradient from mamba-layers is zero. But that from other layers(norm_f, outLinear) seems correct.

Do anyone know what can I do now ?

LogSSim commented 1 month ago

微信截图_20241017173209 Why didn't my parameters have gradient?

zixianwang2022 commented 1 week ago

Any clues on how to solve this? I am having a similar issue.