Closed manishucsd closed 2 months ago
Name | Link |
---|---|
Latest commit | 38e72ffd4c684ab755861c8d3c61bf632f6e1466 |
Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66ccb4d75419ee000818598e |
Deploy Preview | https://deploy-preview-2932--pytorch-fbgemm-docs.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
This pull request was exported from Phabricator. Differential Revision: D60171966
This pull request was exported from Phabricator. Differential Revision: D60171966
This pull request was exported from Phabricator. Differential Revision: D60171966
This pull request was exported from Phabricator. Differential Revision: D60171966
This pull request was exported from Phabricator. Differential Revision: D60171966
This pull request was exported from Phabricator. Differential Revision: D60171966
This pull request was exported from Phabricator. Differential Revision: D60171966
This pull request has been merged in pytorch/FBGEMM@de845bfbf242acc2e026d5fb6450c5f1ac1e00c4.
Summary: This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :
(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand. (b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases. (c) Conform with CUTLASS's device-side API to allow us to sweep all the template parameters that CUTLASS allows. (d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.
Differential Revision: D60171966