CK FP8 Gemm Optimization

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Other

1.2k stars 494 forks source link

CK FP8 Gemm Optimization #2893

Closed jwfromm closed 3 months ago

jwfromm commented 3 months ago

Summary: Full rework of CK FP8 rowwise kernels. We add a ton of new optimized kernels for many shapes and specific workloads. To accommodate so many new kernels, we refactor the existing implementation into many separate files so they compile in parallel.

We also introduce a direct shape to optimal kernel matching table so that the workloads we care about most dont have to rely on heuristics. Other shapes still fallback to an improved heuristic dispatch.

Reviewed By: mxz297

Differential Revision: D59946161

netlify[bot] commented 3 months ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
Latest commit	23cb9242b2543ee76b8ecb59c12d856abb480aae
Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66a2762ccc00fe00080217b8
Deploy Preview	https://deploy-preview-2893--pytorch-fbgemm-docs.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D59946161

facebook-github-bot commented 3 months ago

This pull request has been merged in pytorch/FBGEMM@cf6c9dc2252e4ce7b2cc431c4e3c5079323011b2.

pytorch / FBGEMM

CK FP8 Gemm Optimization #2893

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Deploy Preview for pytorch-fbgemm-docs ready!