teddykoker / torchsort

Fast, differentiable sorting and ranking in PyTorch
https://pypi.org/project/torchsort/
Apache License 2.0
765 stars 33 forks source link

kl regularization returns nan for gradient. #63

Closed pumplerod closed 8 months ago

pumplerod commented 1 year ago

Perhaps I am missing something obvious however I found kl regularization to work a bit better for my data so I switch from l2 to kl and began to get nan values in the gradient.

Here is the test code I used to verify... Python 3.8.10 torch==1.13.0 torchsort==0.1.9

Use of l2 works as expected.

import torch
import torchsort

# This works...
X = torch.tensor([[0.1, 0.3, 0.5, 0.03, 0.2, 0.15, 0.65, 0.7, 0.9]], requires_grad=True)
Y = torchsort.soft_rank(X.view(1, -1),regularization='l2', regularization_strength=1e-4).view(-1)
Y = (Y - Y.min()) / (Y.max() - Y.min())
X_Loss = -torch.log(1 - torch.abs(Y - X.flatten()) + 1e-10)
torch.autograd.grad(X_Loss.mean(), X)
#--yields-> (tensor([[-0.1140, -0.1389, -0.1270,  0.1145, -0.1347, -0.1235, -0.1235, -0.1347,-0.1235]]),)

Use of kl returns nan.

X = torch.tensor([[0.1, 0.3, 0.5, 0.03, 0.2, 0.15, 0.65, 0.7, 0.9]], requires_grad=True)
Y = torchsort.soft_rank(X.view(1, -1),regularization='kl', regularization_strength=1e-4).view(-1)
Y = (Y - Y.min()) / (Y.max() - Y.min())
X_Loss = -torch.log(1 - torch.abs(Y - X.flatten()) + 1e-10)
torch.autograd.grad(X_Loss.mean(), X)
#--yields-> (tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan]]),)
teddykoker commented 1 year ago

Hi @pumplerod, thank you for your patience. I was able to successfully reproduce the nan gradients with you code above. It does seem like reducing increasing the regularization strength does fix the issue, but I will look into this more closely, as ideally the gradients should always be defined, even if they are just 0.