Closed jwfromm closed 3 weeks ago
Name | Link |
---|---|
Latest commit | d5db96230b0db325d72e7a6f89ef22c1055bc159 |
Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66c3d09495815f0008d169be |
Deploy Preview | https://deploy-preview-3008--pytorch-fbgemm-docs.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
This pull request was exported from Phabricator. Differential Revision: D61408771
This pull request was exported from Phabricator. Differential Revision: D61408771
This pull request was exported from Phabricator. Differential Revision: D61408771
This pull request was exported from Phabricator. Differential Revision: D61408771
This pull request has been merged in pytorch/FBGEMM@162cc69b133797b213664e41c9923d96593d1fc3.
Summary: This diff does quite a bit of facelifting to our Marlin BF16 X I4 kernels. These improvements include:
torch.ops.marlin.marlin_gemm
and convenient helpers for quantizing to the marlin formatmarlin_quantize
.One downside of this work is that we have diverged a bit from VLLM so it may be harder to stay in sync going forward. However, I think the benefits of the improvements in this diff outweigh potential sync costs.
Differential Revision: D61408771