pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.12k stars 451 forks source link

Additional Tuning for Cutlass FP8 Rowwise Kernel #2762

Open jwfromm opened 3 weeks ago

jwfromm commented 3 weeks ago

Summary: This diff implements additional tuning for the cutlass rowise kernel on top of the recent output layout change. Our configurations are now much more conformant with recommendations made by the cutlass tuner. To maintain performance across all shapes, I had to add one more kernel mode which sets Cooperative kernels for medium shapes, and PingPong kernels for large shapes.

Benchmarking results can be found here in the results tab. The names of the different tuning configurations I tried are kind of vague, but the final column is the one that is represented by these changes.

Differential Revision: D58848687

netlify[bot] commented 3 weeks ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
Latest commit 833557ff34c5ae7900d3c8ca8c1163c57cc2b92d
Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6674c4b64fdaf50008e4f808
Deploy Preview https://deploy-preview-2762--pytorch-fbgemm-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D58848687