Closed lw closed 3 months ago
Name | Link |
---|---|
Latest commit | f0a1c2b9cda3bb60addfe92c3f70aed4f1a835c6 |
Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/667d533784ea9b0008e8d225 |
Deploy Preview | https://deploy-preview-2780--pytorch-fbgemm-docs.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
This pull request was exported from Phabricator. Differential Revision: D57965065
This pull request was exported from Phabricator. Differential Revision: D57965065
This pull request was exported from Phabricator. Differential Revision: D57965065
This pull request was exported from Phabricator. Differential Revision: D57965065
This pull request has been merged in pytorch/FBGEMM@5a5b0e693bbcc34525802454d48a3f8312069be8.
Summary: Introduce a CUTLASS-based matmul for block-scaled fp8 tensors.
This is based on the regular ("slow" accum) fp8 matmul in CUTLASS, with its fp8 accumulator class changed to do a fused multiply-and-add instead of a regular add into the global accumulator. This required changes throughout the stack, which is why I ended up copying sizeable chunks of CUTLASS into this diff.
Differential Revision: D57965065