Closed jwfromm closed 3 months ago
Name | Link |
---|---|
Latest commit | 23cb9242b2543ee76b8ecb59c12d856abb480aae |
Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66a2762ccc00fe00080217b8 |
Deploy Preview | https://deploy-preview-2893--pytorch-fbgemm-docs.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
This pull request was exported from Phabricator. Differential Revision: D59946161
This pull request has been merged in pytorch/FBGEMM@cf6c9dc2252e4ce7b2cc431c4e3c5079323011b2.
Summary: Full rework of CK FP8 rowwise kernels. We add a ton of new optimized kernels for many shapes and specific workloads. To accommodate so many new kernels, we refactor the existing implementation into many separate files so they compile in parallel.
We also introduce a direct shape to optimal kernel matching table so that the workloads we care about most dont have to rely on heuristics. Other shapes still fallback to an improved heuristic dispatch.
Reviewed By: mxz297
Differential Revision: D59946161