Closed tyterry closed 3 years ago
Hmm, that's unusual. I ran your code on 1080Ti and 2080Ti GPUs and I get non-zero gradients. Can you share which GPU, CUDA version, and PyTorch version you're using?
I ran the code above in a Colab notebook and wasn't able to reproduce there either (K80 GPU, PyTorch 1.8, CUDA 10.1).
Thanks for your prompt reply! Currently I am using gtx1060, cuda version 10.1 and pytorch version 1.8. In that case there maybe something wrong with my setup. I will try to reinstall the libraries and cuda to see if it helps. Will update you the result once finished.
Any update on this issue, @tyterry?
Hi I have been trying to haste_pytorch (the trainning speed of haste is phenomenal!) but I found that the gradients for kernel/recurrent_kernel become zero when the model is trained on gpu. The below is a simple code snippets I tried to test on:
Print out:
kernel tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0') recurrent_kernel tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], device='cuda:0') bias tensor([-1.8202e-10, 3.7714e-09, 2.8942e-09, ..., 1.0455e-08, 2.6969e-09, 1.6647e-08], device='cuda:0')
The gradients for kernel/recurrent_kernel become non-zero once "cuda()" are replaced by "cpu()".
Most grateful if you can provide some insight on it.
Many thanks for your help.