Closed jwfromm closed 3 weeks ago
Name | Link |
---|---|
Latest commit | 0991757f1c31f5caec7bbf0cabb749da8cb9663c |
Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6674c6963475380008e6b190 |
Deploy Preview | https://deploy-preview-2758--pytorch-fbgemm-docs.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
This pull request was exported from Phabricator. Differential Revision: D58755676
This pull request was exported from Phabricator. Differential Revision: D58755676
This pull request was exported from Phabricator. Differential Revision: D58755676
This pull request has been merged in pytorch/FBGEMM@73b07519e4705705b84fa02f439924f420a4580b.
Summary: This diff adds FP8 blockwise gemm support for AMD. I also included a small refactor to break up the various gemm implementations. Not only does this make it easier to read our custom kernel libraries, it also reduces compile time quite a bit. We should probably do something similar for cutlass kernels.
I need to do more extensive performance analysis but initially, it doesnt look that much worse than rowwise.
Reviewed By: jianyuh
Differential Revision: D58755676