Open yan-zaretskiy opened 3 months ago
Hey @yan-zaretskiy, I'm going to list out some asks from my end. I can put them in a central location if there is one.
Basically my asks have to do with everything outside the tensor itself, but using the tensor as a common base layer to implement this other stuff on top of.
From CCL perspective, the main needs are to support various indexing operations on sharded and interleaved tensors. This is to support some of the resharding/TM operations that happen in flight during some CCL ops. There is a dependence on the data producer and consumers which are on different chips, so for this reason, dynamic indexing/address generation is the approach taken.
Today, CCL makes use of some adhoc, custom indexers/address generators but really it would be ideal to move this outside of CCL into a common tensor slicing/indexing library, based off a (set of?) standard tensor layouts.
These types of operations are along the lines of given an input ordinate 4-tuple (or 3 or 2-tuple, depending on the shape), and a global tensor shape, and memory config, return the location of it.
Additionally, we take the above information in the case of sharded tensors and also query for longest contiguous sequence of pages from a given location so we can fire off contiguous reads and feed them into a packetizer to ethernet.
Probably a bit vague as described above. I can get more specific but
TLDR; I want to build a dynamic indexing/address generator library on top of the tensor that takes global shape and offset information.
Also, today, CCL has moved these indexers to a common location that doesn't include any host or device specific headers so that we can test all our indexer logic in gtests. This makes debug and bringup vastly simpler. The new tensor layout should be written such that we can write gtests of these indexers as well.
Sources: