pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Other
1.17k stars 474 forks source link

Enable padding in MX4 quantization kernels #2917

Open jwfromm opened 1 month ago

jwfromm commented 1 month ago

Summary: This diff adds convenient padding for inputs that are not divisble by group size when using triton MX4 padding. We do this in a copy free way by reading in 0s when out of bounds in the input and simply allocating more output space. Thus, each padded value is treated as a 0 which MX4 can always faithfully represent without effecting the group scale of other values.

This should have no impact on performance but does make the output shape a multiple of group_size, so handling that padding by the user must be done carefully.

Differential Revision: D60484989

netlify[bot] commented 1 month ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
Latest commit cdc973d8634e601c6fafda54a8870e2882e3e949
Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66a97c9f296a280008c04db9
Deploy Preview https://deploy-preview-2917--pytorch-fbgemm-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D60484989