teddykoker / torchsort

Fast, differentiable sorting and ranking in PyTorch
https://pypi.org/project/torchsort/
Apache License 2.0
753 stars 33 forks source link

Can I sort by specific column? #34

Open zimonitrome opened 2 years ago

zimonitrome commented 2 years ago

Is there any way to sort a tensor by a given column?

For example, soring by first column:

input_tensor = torch.tensor([
        [1, 5], 
        [30, 30], 
        [6, 9], 
        [80, -2]
])

target_tensor = torch.tensor([
        [80, -2],
        [30, 30], 
        [6, 9], 
        [1, 5], 
])
teddykoker commented 2 years ago

Hi @zimonitrome, this is not possible with this library

zimonitrome commented 2 years ago

@teddykoker I made a version of soft_sort that sorts batches of 2D tensors by a given column.

https://github.com/zimonitrome/AbstractionNet/blob/main/src/SoftSortByColumn.py

Would any of this functionality be relevant for a pull request?

teddykoker commented 2 years ago

Nice work! What would you think about having the current soft_sort function return indices in the same way torch.sort does. This way you could have the differentiable sorted values of a column, and then use the indices to sort the remaining columns. In this way the API would be more consistent with PyTorch's

zimonitrome commented 2 years ago

That could be a great way of doing it, but that would not extend the "softness" to the remaining columns.

I remember trying that method previously and it seemed to restrict the gradients for that specific training script. Even now, the linked way of performing column sorting seems to produce nice gradients but only for multiple samples; In the repo I linked, a network seems to be able to backdrop through the column sort to order a set of rendered shapes for rendering, but this same information does not seem utilized when running an optimizer (without a NN) for a single sample.

Maybe I should come up with a more simple toy problem to better test how useful gradients are in the two proposed methods.

teddykoker commented 2 years ago

Thats a great point! Having a toy example would certainly be beneficial in determining the best way to proceed.

lcrmorin commented 2 years ago

@zimonitrome I was toying with your implementation to build a custom loss. But my model doesn't seem to learn (while it does with 'trivial' losses). Are you sure the gradient is correctly computed ?

zimonitrome commented 2 years ago

@lcrmorin Thanks for trying it out. I thought it was working pretty well but for my odd use-case it only seemed to work with mutli-batch training and not single sample training. I thought this mostly had to do with operations that occur after the column sorting but perhaps it could be that the implementation itself has some bug. When making it I thought it was pretty air tight and worked in the same way as the original torchsort.

Please try to see if you can improve it in any way. If you have troubles, I could send some of my notes that better demonstrate what the code does.

baiyimeng commented 1 year ago

@zimonitrome Hello, I used your code and got a wired problem. Specificially, I want to sort a 2-D tensor by the value of a specific column in a simple model, then get the top-k sum of another column of the sorted tensor and get the gradient of the sum with respect to the parameters of the model. My code is below:

model = nn.Linear(1, 1).cuda()
x = torch.rand([3, 2]).cuda()
x = torch.cat((x, model(x[:, 0].unsqueeze(-1))), dim=1).unsqueeze(0)
y = soft_sort_by_column(values=x, column=-1, regularization_strength=1)
print(y)
z = y[0][:, 1]
metric = torch.sum(z) 
result = torch.autograd.grad(metric, model.parameters())
print(result)

The value of the result is "(tensor([[0.]], device='cuda:0'), tensor([0.], device='cuda:0'))". In fact, I think the value should not be zero because the gradient can propagate in the path: the specific column used to sort --> sort index --> the top-k sum of another column. Could you anaswer this question for me?

ivanlen commented 7 months ago

Any updates on this??

MatthijsdeJ commented 1 month ago

I strongly support this feature. I have the use case where I obtain an arbitrary amount of input vectors, and I would like to learn an automatic sorting of these vectors. This could be achieved by predicting a value per vector, and differentiably sorting the vectors according to that singular value.