teddykoker / torchsort

Fast, differentiable sorting and ranking in PyTorch
https://pypi.org/project/torchsort/
Apache License 2.0
765 stars 33 forks source link

Understanding regularization strength #66

Closed simpsus closed 8 months ago

simpsus commented 1 year ago

I use torchsort in my loss function. My issue is that sometimes ist returns NaN, depending on the regularization strength. My batches are between 1k and 5k samples and there are ~1k features.

Is there some documentation on regularization strength? scrolling through the code I cannot find anything.

Is there a way to estimate a good regularization strength value depending on your data?

I understand that 1 is the default value and reducing regularization strength brings the result closer to the true ordering. So, is the following a good heuristic?

teddykoker commented 8 months ago

Hey @simpsus, apologies for the very late reply I must have missed this. For more information surrounding the regularization strength, it would be best to address the original paper, which denotes the regularization strength as parameter $\varepsilon$. It essentially controls how "soft" the sort/ranking is. As $\varepsilon \to \infty$ the values all collapse to a constant, as $\varepsilon \to 0$ the values converge to the hard soft/ranking values. This also effects how smooth the function is. I don't think there's a great rule to setting the value, in the paper they perform a hyperparameter search using log-spaced values from $10^{-3}$ to $10^{4}$; your best bet is probably to do the same with some cross-validation dataset.