Enable padding in MX4 quantization kernels

jwfromm commented 1 month ago

Summary: This diff adds convenient padding for inputs that are not divisble by group size when using triton MX4 padding. We do this in a copy free way by reading in 0s when out of bounds in the input and simply allocating more output space. Thus, each padded value is treated as a 0 which MX4 can always faithfully represent without effecting the group scale of other values.

This should have no impact on performance but does make the output shape a multiple of group_size, so handling that padding by the user must be done carefully.

Differential Revision: D60484989

netlify[bot] commented 1 month ago

Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
Latest commit	cdc973d8634e601c6fafda54a8870e2882e3e949
Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66a97c9f296a280008c04db9
Deploy Preview	https://deploy-preview-2917--pytorch-fbgemm-docs.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D60484989

pytorch / FBGEMM