Block-wise FP8 matmul - Githubissues

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Other

1.18k stars 486 forks source link

Block-wise FP8 matmul #2780

Closed lw closed 3 months ago

lw commented 3 months ago

Summary: Introduce a CUTLASS-based matmul for block-scaled fp8 tensors.

This is based on the regular ("slow" accum) fp8 matmul in CUTLASS, with its fp8 accumulator class changed to do a fused multiply-and-add instead of a regular add into the global accumulator. This required changes throughout the stack, which is why I ended up copying sizeable chunks of CUTLASS into this diff.

Differential Revision: D57965065

netlify[bot] commented 3 months ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
Latest commit	f0a1c2b9cda3bb60addfe92c3f70aed4f1a835c6
Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/667d533784ea9b0008e8d225
Deploy Preview	https://deploy-preview-2780--pytorch-fbgemm-docs.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.