The function cross_entropy_with_probs is to calculate CrossEntropy with logits (input) and a probability vector (target).
Let's not care about the weight and only focus on the input and target, then the formula is
-target*log(softmax(input))
So a natural thought is to calculate directly:
(-target * F.log_softmax(input, dim=1)).sum(1)
But from the current implementation
Docs and Source Code
It seems to perform F.cross_entropy on each class to calculate log_softmax and sum up, which seems pretty weird. (I know the result is still correct.)
Anyone please tell me the advantage of doing so? I think it might be explained by the advantage of pytorch F.cross_entropy over DIY function, where the latter refers to
The function
cross_entropy_with_probs
is to calculate CrossEntropy with logits (input) and a probability vector (target).Let's not care about the
weight
and only focus on theinput
andtarget
, then the formula isSo a natural thought is to calculate directly:
But from the current implementation Docs and Source Code
It seems to perform
F.cross_entropy
on each class to calculatelog_softmax
and sum up, which seems pretty weird. (I know the result is still correct.)Anyone please tell me the advantage of doing so? I think it might be explained by the advantage of pytorch
F.cross_entropy
over DIY function, where the latter refers to