Does ScaledDotProductAttention support backward pass?

liuliu / s4nnc

Swift for NNC

https://libnnc.org

BSD 3-Clause "New" or "Revised" License

70 stars 8 forks source link

Does ScaledDotProductAttention support backward pass? #20

Closed ghost closed 10 months ago

ghost commented 10 months ago

liuliu commented 10 months ago

It is supported on Metal, but not on CUDA yet. I am mostly enable CUDA side of the SDP to see the performance differential and help to implement some other LLMs.