The objective of this issue is to create the infrastructure needed and modify the implementations so that tiny tiles (i.e. tiles with a size smaller than 32x32) can be supported by the AllGather, Reduce-Scatter, and Barrier OPs. The objectives are below:
[x] Modify the testing infrastructure to support tiny tiles and add tiny tile tests to all three OPs
[x] Perform the necessary changes to the tensor manipulation libraries required by CCL to support tiny tiles
[x] Modify the host and kernel code to take tiling information from the Tensor itself
[x] Fix the all gather code and libraries it calls to make tiny tiles work
[ ] Fix the reduce-scatter code and libraries it calls to make tiny tiles work
[ ] Validate that the results are correct regardless of the tiling pattern used
The objective of this issue is to create the infrastructure needed and modify the implementations so that tiny tiles (i.e. tiles with a size smaller than 32x32) can be supported by the AllGather, Reduce-Scatter, and Barrier OPs. The objectives are below: