Auto-generation of CUTLASS Extension Kernel Templates

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Other

1.2k stars 494 forks source link

Auto-generation of CUTLASS Extension Kernel Templates #2932

Closed manishucsd closed 2 months ago

manishucsd commented 2 months ago

Summary: This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand. (b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases. (c) Conform with CUTLASS's device-side API to allow us to sweep all the template parameters that CUTLASS allows. (d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966

netlify[bot] commented 2 months ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
Latest commit	38e72ffd4c684ab755861c8d3c61bf632f6e1466
Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66ccb4d75419ee000818598e
Deploy Preview	https://deploy-preview-2932--pytorch-fbgemm-docs.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.