pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.2k stars 494 forks source link

CK FP8 Gemm Optimization #2893

Closed jwfromm closed 3 months ago

jwfromm commented 3 months ago

Summary: Full rework of CK FP8 rowwise kernels. We add a ton of new optimized kernels for many shapes and specific workloads. To accommodate so many new kernels, we refactor the existing implementation into many separate files so they compile in parallel.

We also introduce a direct shape to optimal kernel matching table so that the workloads we care about most dont have to rely on heuristics. Other shapes still fallback to an improved heuristic dispatch.

Reviewed By: mxz297

Differential Revision: D59946161

netlify[bot] commented 3 months ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
Latest commit 23cb9242b2543ee76b8ecb59c12d856abb480aae
Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66a2762ccc00fe00080217b8
Deploy Preview https://deploy-preview-2893--pytorch-fbgemm-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 3 months ago

This pull request was exported from Phabricator. Differential Revision: D59946161

facebook-github-bot commented 3 months ago

This pull request has been merged in pytorch/FBGEMM@cf6c9dc2252e4ce7b2cc431c4e3c5079323011b2.