Open xiaxiaoguang opened 1 month ago
I tried training Mamba models on NVIDIA GeForce GTX TITAN X whose CUDA Version is 12.4,but failed to update the parameters.
My torch version is '1.13.1+cu117'.
Because of the internet problem(The server cannot connect github), I cannot use 'pip install / conda install' to install mamba-ssm.
So I cloned the repositories and downloaded the corresponding .whl files from another computer, and upload it to my server.
Then I only modified the setup.py files to make it use local url instead of visiting github ,specificly i only change the value of 'BASE_WHEEL_URL'.
After that it successfully installed , then i tried training it. I find the loss miraculous increasing during the training.
SO I print the gradient after the optim.step(), I got great results:
As you can see, the gradient from mamba-layers is zero. But that from other layers(norm_f, outLinear) seems correct.
Do anyone know what can I do now ?
Why didn't my parameters have gradient?
Any clues on how to solve this? I am having a similar issue.
I tried training Mamba models on NVIDIA GeForce GTX TITAN X whose CUDA Version is 12.4,but failed to update the parameters.
My torch version is '1.13.1+cu117'.
Because of the internet problem(The server cannot connect github), I cannot use 'pip install / conda install' to install mamba-ssm.
So I cloned the repositories and downloaded the corresponding .whl files from another computer, and upload it to my server.
Then I only modified the setup.py files to make it use local url instead of visiting github ,specificly i only change the value of 'BASE_WHEEL_URL'.
After that it successfully installed , then i tried training it. I find the loss miraculous increasing during the training.
SO I print the gradient after the optim.step(), I got great results:
As you can see, the gradient from mamba-layers is zero. But that from other layers(norm_f, outLinear) seems correct.
Do anyone know what can I do now ?