yrmo / cudagrad

CUDA C++ strided float tensor automatic differentiation engine with Python bindings
MIT License
0 stars 0 forks source link

Implement `nn.softmax` #61

Closed yrmo closed 2 months ago

yrmo commented 2 months ago

https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html

yrmo commented 2 months ago

https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/

yrmo commented 2 months ago
>>> torch.nn.functional.softmax(torch.tensor([0.0, 1.0, 2.0, 4.0]))
tensor([0.0152, 0.0414, 0.1125, 0.8310])
>>> torch.nn.functional.softmax(torch.tensor([0.0, 1000.0, 2000.0, 4000.0]))
tensor([0., 0., 0., 1.])

$$ S_j = \frac{e^{aj}}{\sum\limits{k=1}^N e^{a_k}} = \frac{C e^{aj}}{\sum\limits{k=1}^N C e^{a_k}} $$

$$ S_j = \frac{e^{aj + \log(C)}}{\sum\limits{k=1}^N e^{a_k + \log(C)}} $$

$$ S_j = \frac{e^{aj + D}}{\sum\limits{k=1}^N e^{a_k + D}} $$

$$ D = -\max(a_1, a_2, \cdots, a_N) $$

yrmo commented 2 months ago
>>> torch.exp(torch.tensor([1, 2, 3.], requires_grad=True))
tensor([ 2.7183,  7.3891, 20.0855], grad_fn=<ExpBackward0>)
yrmo commented 2 months ago

Pow would be good now https://github.com/yrmo/cudagrad/issues/26

yrmo commented 2 months ago
>>> import torch
>>> torch.nn.functional.softmax
<function softmax at 0x7efceb96cd60>
>>> t = torch.tensor([0.0, 1.0, 2.0, 4.0])
>>> t
tensor([0., 1., 2., 4.])
>>> e = t.exp()
>>> e / e.sum()
tensor([0.0152, 0.0414, 0.1125, 0.8310])
>>> torch.nn.functional.softmax(t)
<stdin>:1: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
tensor([0.0152, 0.0414, 0.1125, 0.8310])
>>> torch.nn.functional.softmax(t, dim=0)
tensor([0.0152, 0.0414, 0.1125, 0.8310])
>>> torch.__version__
'2.4.0+cu121'
yrmo commented 2 months ago
>>> from cudagrad import Tensor
>>> t = Tensor([4], [0.0, 1.0, 2.0, 4.0])
>>> t
<Tensor([4, ], [0, 1, 2, ...]) object at 0x55bf32942a90 DefaultBackward>
>>> t.exp()
<Tensor([4, ], [1, 2.71828, 7.38906, ...]) object at 0x55bf327e0680 ExpBackward>
>>> e = t.exp()
>>> e.sum()
<Tensor([1, ], [65.7055, ]) object at 0x55bf329443b0 SumBackward>
>>> e / e.sum()
python: /home/ryan/cudagrad/src/tensor.hpp:429: std::shared_ptr<cg::Tensor> cg::binaryElementwiseOperator(std::shared_ptr<cg::Tensor>, std::shared_ptr<cg::Tensor>, std::function<float(float, float)>, std::string, std::shared_ptr<_Tp>) [with T = cg::DivBackward; std::string = std::__cxx11::basic_string<char>]: Assertion `lhs.get()->data_.size() == rhs.get()->data_.size()' failed.
Aborted (core dumped)

Need broadcast in division... Need to cleanup messy job from before

yrmo commented 2 months ago
>>> from cudagrad import Tensor
>>> t = Tensor([4], [0.0, 1.0, 2.0, 4.0])
>>> from cudagrad import nn
>>> nn
<module 'cudagrad.nn' from '/home/ryan/.pyenv/versions/3.11.7/lib/python3.11/site-packages/cudagrad/nn.py'>
>>> nn.softmax
<function softmax at 0x7fcac4887ce0>
>>> nn.softmax(t)
python: /home/ryan/cudagrad/src/tensor.hpp:429: std::shared_ptr<cg::Tensor> cg::binaryElementwiseOperator(std::shared_ptr<cg::Tensor>, std::shared_ptr<cg::Tensor>, std::function<float(float, float)>, std::string, std::shared_ptr<_Tp>) [with T = cg::DivBackward; std::string = std::__cxx11::basic_string<char>]: Assertion `lhs.get()->data_.size() == rhs.get()->data_.size()' failed.
Aborted (core dumped)
yrmo commented 2 months ago

This is in binaryElementwiseOperator not DivBackward!

yrmo commented 2 months ago
>>> from cudagrad import Tensor
>>> t = Tensor([4], [0.0, 1.0, 2.0, 4.0])
>>> t.exp()
<Tensor([4, ], [1, 2.71828, 7.38906, ...]) object at 0x55702f34c700 ExpBackward>
>>> e = t.exp()
>>> e.sum()
<Tensor([1, ], [65.7055, ]) object at 0x55702f4b19c0 SumBackward>
>>> e / e.sum()
<Tensor([4, ], [0.0152194, 0.0413707, 0.112457, ...]) object at 0x55702f4b1be0 DivBackward>
>>> from cudagrad import nn
>>> 
>>> nn.softmax(t)
<Tensor([4, ], [0.0152194, 0.0413707, 0.112457, ...]) object at 0x55702f4b20e0 DivBackward>