pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.12k stars 451 forks source link

FBGEMM CK Blockwise FP8 Kernel #2758

Closed jwfromm closed 3 weeks ago

jwfromm commented 3 weeks ago

Summary: This diff adds FP8 blockwise gemm support for AMD. I also included a small refactor to break up the various gemm implementations. Not only does this make it easier to read our custom kernel libraries, it also reduces compile time quite a bit. We should probably do something similar for cutlass kernels.

I need to do more extensive performance analysis but initially, it doesnt look that much worse than rowwise.

Reviewed By: jianyuh

Differential Revision: D58755676

netlify[bot] commented 3 weeks ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
Latest commit 0991757f1c31f5caec7bbf0cabb749da8cb9663c
Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6674c6963475380008e6b190
Deploy Preview https://deploy-preview-2758--pytorch-fbgemm-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D58755676

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D58755676

facebook-github-bot commented 3 weeks ago

This pull request was exported from Phabricator. Differential Revision: D58755676

facebook-github-bot commented 3 weeks ago

This pull request has been merged in pytorch/FBGEMM@73b07519e4705705b84fa02f439924f420a4580b.