Closed mrT23 closed 5 years ago
I agree it's a clunky way to calculate p_out
but it's still correct. If you only took exp
the values will be un-normalized and will not be actual probabilities.
I believe the reason it was done this way was to avoid overflow/underflow issues with softmax. The log_softmax
function takes care of this, so passing it through softmax
again gives back the probabilities without numerical issues.
this line:
p_out_abstain = torch.exp(F.log_softmax(input_batch, dim=1)[:, -1])
gives identical results to: `p_out = F.softmax(F.log_softmax(input_batch,dim=1),dim=1)
p_out_abstain = p_out[:,-1]` (i checked !)
but it does a single exp operation instead of doing softmax on all K classes
You're right, it's identical (since the denominator in softmax(F.log_softmax)
sums to 1), so exp
is sufficient.
is this the right way to calculate p_out_abstain
?
shouldn't it be simple 'exp' instead of the 'softmax':
?