Metal uses:
using CoreCoord = tt_xy_pair;
and exposes CoreCoord through the api as the way to specify which kernels run on which cores. This makes sense as we don't have to do any conversions as we walk down the stack.
Internally, we map kernels to cores, cores to kernels, etc extensively creating long lists of CoreCoords. On grayskull, we may have lists of 120 CoreCoords for a kernel.
I just bumped into the fact that these are std::size, typically 8 bytes for X,Y. However, our HW is 12x10 max. It seems these should be uint8_t or (maybe?) uint16_t. This would be way more cache efficient (8x reduction in size).
Comment that seemed useful
Is this to change tt_xy_pair to use different underlying types or to change CoreCoord to no longer type alilas to tt_xy_pair? If the former, since BBE also uses tt_xy_pair, and possibly for representing coordinates in other datastructures (e.g. tile x/y, not just core xy), it may be worthwhile to template the tt_xy_pair and keep the default as std::size_t. Perhaps over time, we start being more explicit about if we are using tt_xy_pair for a core location
Copy of the issue from old
open-umd
repo - https://yyz-gitlab.local.tenstorrent.com/tenstorrent/open-umd/-/issues/15Metal uses:
using CoreCoord = tt_xy_pair;
and exposes CoreCoord through the api as the way to specify which kernels run on which cores. This makes sense as we don't have to do any conversions as we walk down the stack. Internally, we map kernels to cores, cores to kernels, etc extensively creating long lists of CoreCoords. On grayskull, we may have lists of 120 CoreCoords for a kernel. I just bumped into the fact that these are std::size, typically 8 bytes for X,Y. However, our HW is 12x10 max. It seems these should be uint8_t or (maybe?) uint16_t. This would be way more cache efficient (8x reduction in size).Comment that seemed useful
Is this to change
tt_xy_pair
to use different underlying types or to changeCoreCoord
to no longer type alilas to tt_xy_pair? If the former, since BBE also uses tt_xy_pair, and possibly for representing coordinates in other datastructures (e.g. tile x/y, not just core xy), it may be worthwhile to template thett_xy_pair
and keep the default asstd::size_t
. Perhaps over time, we start being more explicit about if we are using tt_xy_pair for a core location