FBGEMM CK Blockwise FP8 Kernel

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Other

1.12k stars 451 forks source link

FBGEMM CK Blockwise FP8 Kernel #2758

Closed jwfromm closed 3 weeks ago

jwfromm commented 3 weeks ago

Summary: This diff adds FP8 blockwise gemm support for AMD. I also included a small refactor to break up the various gemm implementations. Not only does this make it easier to read our custom kernel libraries, it also reduces compile time quite a bit. We should probably do something similar for cutlass kernels.

I need to do more extensive performance analysis but initially, it doesnt look that much worse than rowwise.

Reviewed By: jianyuh

Differential Revision: D58755676

netlify[bot] commented 3 weeks ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
Latest commit	0991757f1c31f5caec7bbf0cabb749da8cb9663c
Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6674c6963475380008e6b190
Deploy Preview	https://deploy-preview-2758--pytorch-fbgemm-docs.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D58755676

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D58755676

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D58755676

facebook-github-bot commented 3 weeks ago

This pull request has been merged in pytorch/FBGEMM@73b07519e4705705b84fa02f439924f420a4580b.

pytorch / FBGEMM

FBGEMM CK Blockwise FP8 Kernel #2758

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Deploy Preview for pytorch-fbgemm-docs ready!