Q: Kernel fusion details

shamanDevel / quick-mlp

Fused Multi-layer-perceptrons in CUDA

MIT License

16 stars 1 forks source link

Q: Kernel fusion details #3

Open ib00 opened 11 months ago

ib00 commented 11 months ago

Do you have any details how you fuse kernels together? If I am not mistaken, Nvidia's project does it by hand. Do you do it automatically? Are there any limitations?

shamanDevel commented 10 months ago

I also wrote the kernels by hand. The main difference is:

tiny-cuda-nn: Kernels are instantiated on compile-time (switch-statements to template instantiations) --> full control and slightly faster
quick-mlp: Kernels are assembled on runtime by setting pre-processor macros and template parameters based on the network config and then compile using nvrtc --> more flexible

Does this answer your question?