tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
https://docs.tenstorrent.com/ttnn/latest/index.html
Apache License 2.0
483 stars 78 forks source link

[Feature Request] Implement gelu_bw fused op #14160

Open dmakoviichuk-tt opened 1 month ago

dmakoviichuk-tt commented 1 month ago

Is your feature request related to a problem? Please describe. Composite version has a lot of calls to the slwo binary ops:

Screenshot 2024-10-23 at 9 39 19 AM

Describe the solution you'd like Have one op call for this op. Describe alternatives you've considered Currently using composite which is slow. Additional context

dmakoviichuk-tt commented 1 week ago

@davorchap gelu_bw can save up to 30ms in nanogpt.

davorchap commented 3 days ago

@yan-zaretskiy would be good to take this one on soon!