Extend support to varying block sizes on both dimensions for 2D matrices

jananisriram commented 2 weeks ago

Summary: Extend support for reducing across individual dimensions on 2-dimensional matrices by allowing for varying block sizes on both the M (first) and N (second) dimensions.

The existing kernel performed a simplified reduction, assuming that the entire reduction dimension fit within one thread block. The new kernel implementation removes the need for this assumption, allowing both the reduction and the non-reduction dimensions to fit in multiple thread blocks. This implementation also enables autotuning on block sizes for both the M and N dimensions.

For 1D results, add a sum_then_buffer configuration which decides which kernel configuration to run. Sum_then_buffer sums individual blocks of input and adds these sums into a buffer. Buffer_then_sum adds blocks of raw input into a buffer, then reduces the buffer.

Reviewed By: davidberard98

Differential Revision: D58313958

facebook-github-bot commented 2 weeks ago

This pull request was exported from Phabricator. Differential Revision: D58313958

facebook-github-bot commented 2 weeks ago

This pull request was exported from Phabricator. Differential Revision: D58313958

facebook-github-bot commented 2 weeks ago

This pull request was exported from Phabricator. Differential Revision: D58313958

facebook-github-bot commented 2 weeks ago

This pull request has been merged in pytorch/benchmark@f4cbf782eccc759c13a013beacc8f79534dbd642.

pytorch / benchmark

Extend support to varying block sizes on both dimensions for 2D matrices #2302