kl regularization returns nan for gradient.

Perhaps I am missing something obvious however I found kl regularization to work a bit better for my data so I switch from l2 to kl and began to get nan values in the gradient.

Here is the test code I used to verify... Python 3.8.10 torch==1.13.0 torchsort==0.1.9

Use of l2 works as expected.

import torch
import torchsort

# This works...
X = torch.tensor([[0.1, 0.3, 0.5, 0.03, 0.2, 0.15, 0.65, 0.7, 0.9]], requires_grad=True)
Y = torchsort.soft_rank(X.view(1, -1),regularization='l2', regularization_strength=1e-4).view(-1)
Y = (Y - Y.min()) / (Y.max() - Y.min())
X_Loss = -torch.log(1 - torch.abs(Y - X.flatten()) + 1e-10)

torch.autograd.grad(X_Loss.mean(), X)
#--yields-> (tensor([[-0.1140, -0.1389, -0.1270,  0.1145, -0.1347, -0.1235, -0.1235, -0.1347,-0.1235]]),)

Use of kl returns nan.

X = torch.tensor([[0.1, 0.3, 0.5, 0.03, 0.2, 0.15, 0.65, 0.7, 0.9]], requires_grad=True)
Y = torchsort.soft_rank(X.view(1, -1),regularization='kl', regularization_strength=1e-4).view(-1)
Y = (Y - Y.min()) / (Y.max() - Y.min())
X_Loss = -torch.log(1 - torch.abs(Y - X.flatten()) + 1e-10)

torch.autograd.grad(X_Loss.mean(), X)
#--yields-> (tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan]]),)

teddykoker / torchsort

kl regularization returns nan for gradient. #63