shamanDevel / quick-mlp

Fused Multi-layer-perceptrons in CUDA
MIT License
16 stars 1 forks source link

Q: Kernel fusion details #3

Open ib00 opened 11 months ago

ib00 commented 11 months ago

Do you have any details how you fuse kernels together? If I am not mistaken, Nvidia's project does it by hand. Do you do it automatically? Are there any limitations?

shamanDevel commented 10 months ago

I also wrote the kernels by hand. The main difference is:

Does this answer your question?