kiudee / cs-ranking

Context-sensitive ranking and choice in Python with PyTorch
https://cs-ranking.readthedocs.io
Apache License 2.0
66 stars 15 forks source link

"mean of empty slice" in spearman correlation calculation #73

Open timokau opened 4 years ago

timokau commented 4 years ago

During the tests, numpy complains about a "mean of empty slice". That happens because the calculation of the spearman correlation filters the labels it applies to as follows:

https://github.com/kiudee/cs-ranking/blob/ba03234fb61a4e645b393d2d9ac81c0b85399024/csrank/metrics_np.py#L24

And then averages its results:

https://github.com/kiudee/cs-ranking/blob/ba03234fb61a4e645b393d2d9ac81c0b85399024/csrank/metrics_np.py#L29

Which may be empty (or consist of only NaNs) due to the previous filter. What is the intention behind that filter?

CC @prithagupta

kiudee commented 4 years ago

The filter is applied to remove instances for which there are ties in the prediction. Ties are problematic in the calculation of Spearman correlation and can cause a non-minor bias. But I also think that the current state of the code could be improved - at the very least the user should get a warning.

Here is a paper discussing several methods on how to deal with ties: https://www.tandfonline.com/doi/full/10.1080/02664763.2015.1043870

prithagupta commented 4 years ago

@kiudee @timokau even the script version takes ties into consideration. But we need to check that implementation on how they do it. As far as I remember we removed it because it was not correct or efficient ways of evaluating spearman correlation.