Gradients are gone when moving the code to CUDA

mariaalfaroc commented 1 year ago

Hi,

First of all, thank you so much for this amazing implementation! I am trying to use your code (the example code), but I am getting an error when I moved everything to CUDA.

import pysdtw

device = torch.device("cuda")

# the input data includes a batch dimension
X = torch.rand((10, 5, 7), requires_grad=True).to(device)
Y = torch.rand((10, 9, 7)).to(device)

# optionally choose a pairwise distance function
fun = pysdtw.distance.pairwise_l2_squared

# create the SoftDTW distance function
sdtw = pysdtw.SoftDTW(gamma=1.0, dist_func=fun, use_cuda=True)

# soft-DTW discrepancy, approaches DTW as gamma -> 0
res = sdtw(X, Y)

# define a loss, which gradient can be backpropagated
loss = res.sum()
loss.backward()

# X.grad now contains the gradient with respect to the loss

If I print X.grad, the result is empty and I get the following warning message:

/usr/local/lib/python3.7/dist-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at  aten/src/ATen/core/TensorBody.h:477.)
  return self._grad

I'm running the code using Google Colab. Any idea why this is happening? Again thank you so much!

toinsson commented 1 year ago

Dear @mariaalfaroc, I'm glad you found pysdtw useful!

The issue you encounter comes from the way you initialise your tensors. In the examples, you can see that the tensors are created on CPU, before being sent to the GPU. In your code, you do this in one step, which loses the gradient. I think this is the topic of this question on the pytorch forum: https://discuss.pytorch.org/t/no-gradient-on-cuda/144807

You can fix this in several ways. Similar to the code in the examples folders, first create X on CPU and then send X to GPU. Or, create X directly on GPU with the device keyword.

In other words, this works:

X = torch.rand((10, 5, 7), requires_grad=True)
Y = torch.rand((10, 9, 7))
...
res = sdtw(X.to(device, Y.to(device)))

and this works too:

X = torch.rand((10, 5, 7), device=device, requires_grad=True)
Y = torch.rand((10, 9, 7), device=device)

toinsson commented 1 year ago

I think we can close this issue, feel free to re-open it if you have more questions.

toinsson / pysdtw

Gradients are gone when moving the code to CUDA #2