pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.18k stars 486 forks source link

Block-wise FP8 matmul #2780

Closed lw closed 3 months ago

lw commented 3 months ago

Summary: Introduce a CUTLASS-based matmul for block-scaled fp8 tensors.

This is based on the regular ("slow" accum) fp8 matmul in CUTLASS, with its fp8 accumulator class changed to do a fused multiply-and-add instead of a regular add into the global accumulator. This required changes throughout the stack, which is why I ended up copying sizeable chunks of CUTLASS into this diff.

Differential Revision: D57965065

netlify[bot] commented 3 months ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
Latest commit f0a1c2b9cda3bb60addfe92c3f70aed4f1a835c6
Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/667d533784ea9b0008e8d225
Deploy Preview https://deploy-preview-2780--pytorch-fbgemm-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D57965065

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D57965065

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D57965065

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D57965065

facebook-github-bot commented 3 months ago

This pull request has been merged in pytorch/FBGEMM@5a5b0e693bbcc34525802454d48a3f8312069be8.