Open bofen97 opened 8 months ago
Triton can be used to generate PTX code and candle uses PTX for its current kernels (except that they are generated using the nvcc compiler rather than triton) - see for example this file. It shoud be pretty straightforward to hook triton generated ptx at the same place, what would be good is someone writing some tutorial material to help users doing this.
Yes, I will try to do it. At present, the documentation of candle needs to be improved.
@LaurentMazare @bofen97
Created a minimal example of loading a Triton kernel in Rust.
Lots to be done to make interfacing with Triton
generated kernels more ergonomic per the notes in the aforementioned repo.
Let me know this is something of interest -- happy to explain / add more detailed examples.
minimal example
Thanks, I will study your code.
The handwritten CUDA operator is very complicated. How can we use openai triton in candle to simplify this process. :)