Open jwfromm opened 3 weeks ago
Name | Link |
---|---|
Latest commit | 833557ff34c5ae7900d3c8ca8c1163c57cc2b92d |
Latest deploy log | https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6674c4b64fdaf50008e4f808 |
Deploy Preview | https://deploy-preview-2762--pytorch-fbgemm-docs.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
This pull request was exported from Phabricator. Differential Revision: D58848687
Summary: This diff implements additional tuning for the cutlass rowise kernel on top of the recent output layout change. Our configurations are now much more conformant with recommendations made by the cutlass tuner. To maintain performance across all shapes, I had to add one more kernel mode which sets Cooperative kernels for medium shapes, and PingPong kernels for large shapes.
Benchmarking results can be found here in the results tab. The names of the different tuning configurations I tried are kind of vague, but the final column is the one that is represented by these changes.
Differential Revision: D58848687