yrmo / cudagrad

CUDA C++ strided float tensor automatic differentiation engine with Python bindings
MIT License
0 stars 0 forks source link

Implement `nn.CrossEntropyLoss` #62

Open yrmo opened 2 months ago

yrmo commented 2 months ago

https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

yrmo commented 1 month ago

$$ \text{xent}(Y, P) = -\sum_{k=1}^{T} Y(k) \log(P(k)) $$

$Y$ is one-hot truth, $P$ is logit probabilities (softmax)

https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/