Closed YanisLalou closed 7 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
c50867e
) 88.83% compared to head (2ed1c2c
) 88.95%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Actually there was another mistake. Now we compute directly the mean of the entropy of each sample. However there is still something bothering. Now the score outputted is bounded by -xlog(x) with x a probability. Thus $score \in [0, \frac{1}{e}*\log\frac{1}{e} ]$ and not $\in [0, 1]$ as we could expect for a score.
I reread the paper. For entropy scorer, we want to minimize the entropy, so greater_is_better
equals False
here. And the entropy is not a score, so that's why it is not between [0, 1]
Maybe we can change the name
Yes my bad you're right greater_is_better
equals False
here.
So yes we should change the name
OR we could scale the entropy to be between [0, 1] and call it a score (we kind of already do that by computing a mean instead of a sum)
Tbh I do prefer the last choice, since it'll be easier to plot the results next to other scores bounded between [0, 1] like the accuracy, recall, precision....
I think the best thing is to put a sum instead of the mean and implement what is done in the paper. I think the mean was a mistake. We can plot it like a loss.
Not a fan of removing the mean. If we do that the output will be proportionate to the number of samples. Thus it will be hard to compare methods effectiveness between different datasets. (plus if we're considering this class as a loss, in pytroch by default they compute the mean(L) instead of the sum(L))
Would love to have the thoughts of @kachayev @rflamary
Yeah, maybe a parameter reduction
to be able to fit the paper if wanted with option None
or mean
.
good idea to handle both (state in teh doc that one of them corespod to the paper)
That's an interesting question...
Usually, I'm a 'theory absolutist' when it comes to the questions like this one. If we call it 'entropy', it should perform 'sum'. Otherwise it's no longer entropy.
In PyTorch, 'reduction' parameter for losses is typically (as it should be) a 'batch-related' setting: it only tells the system what to do with the fact the the loss is typically per-item thing, which means that for a batch we have an array of those, while the gradient descent requires a scalar. Thus it provides 'sum' or 'mean' options for how to collapse the array of values into a scalar. This doesn't change the nature of the loss itself, just a mini-batch GD. Also, from the gradient perspective it's just a matter for multiplicative scalar.
You are right that with larger number of samples max entropy of the system increases. As it should... The hypothesis that 'entropy divided by the number of samples' makes better comparison between different methods is interesting though by no means obvious or intuitive.
Also,
$\in [0, 1]$ as we could expect for a score
Where is this stated for sklearn scores to be from a particular range? I'm not sure I've seen this requirement.
Also,
∈[0,1] as we could expect for a score
Where is this stated for sklearn scores to be from a particular range? I'm not sure I've seen this requirement.
It's not a requirement, but to the best of my recollection, I believe that every class ending with the term 'score' - 'scorer' is upper-bounded by 1. Otherwise, they are referred to as 'loss,' 'error,' and so on. I'm not 100% sure of that, but I quickly looked at: https://scikit-learn.org/stable/modules/model_evaluation.html and it seems to correlate with what I'm saying.
Issue:
PredictionEntropyScorer
output negative scores Solution: InPredictionEntropyScorer
we compute equation (page 3 paper: https://arxiv.org/pdf/1711.10288.pdf): $$[ E(XT) = - \sum{x_t \in T} \langle f(x_t; \theta), \log f(x_t; \theta) \rangle ]$$In the code we forgot to use the greater_is_better flag + we use a double minus sign in the formula. + Add test case to test that all scores computed are > 0