Open yrmo opened 2 months ago
https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
$$ \text{xent}(Y, P) = -\sum_{k=1}^{T} Y(k) \log(P(k)) $$
$Y$ is one-hot truth, $P$ is logit probabilities (softmax)
https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html