Closed lezcano closed 4 hours ago
@Jokeren This one's ready for review. Now the formulas are particularly clean and there's no special-casing for Hopper or Ampere
Having clean formulas shows that the edge-case M=8
for opIdx=0
was wrong, but well, all this will be fixed by LLs.
I am hitting some issues with mma and dot Hopper layouts for a different PR. I'm going to merge this PR and I'll add a note to use this logic in all DistributedLayouts at a later stage.
We simplify the implementation of
getElemsPerThread
and strengthen the preconditions ofgetRepForOperand
.More generally, we should try to minimise the calls to
isAmpere
andisHopper
throughout the codebase. I'll do a pass fixing many of these once we land LLs forldmatrix
and Hopper.