Open workingjubilee opened 4 days ago
afaik PowerPC MMA doesn't change the ABI: https://github.com/rust-lang/rust/issues/131800#issuecomment-2418749961
It is good this issue is about handling ABIs rather than merely describing them, then? specifically, if we want to avoid involving this in our ABIs, we need to adopt the same bans.
How does LLVM even represent these types in function signatures?
Sounds to me like this will require repr(matrix)
and corresponding dedicated logic everywhere?
I'm not sure if there's much in common that would justify repr(matrix)
. Each ISA might just require boutique handling here. But I am still trying to understand how Power ISA's MMA, Arm's Scalable Matrix Extensions, and x86's AMX tiles work, and how we will want to represent them.
My current understanding is
__vector_pair
and __vector_quad
are the relevant types__vector_quad
represents the accumulator register__vector_quad
type should never be passed anywhere?__vector_quad
type is always handled by-pointer.__vector_pair
type seems to be defined as opaque(?) yet is sometimes passed by-value to intrinsics.It is almost more like a dedicated thread-local allocation... the "ZArray"... that gets reinterpreted or examined along various dimensions. Then you set the CPU into Matrix Math... sorry, "Arm Streaming SVE" state... and Big Array Math happens, accumulating into the ZArray. The Big Array Math however is expressible as vector operations that just might use a different size than the normal Arm SVE operations, which is why it's "Streaming SVE": the model is "matrix math is mostly a pile of vector operations, done really fast". This does remove the ability to use some of the more complicated Arm SVE2 operations while in it.
The tiles seem to be more "classic" registers, but use an interesting API. They are also "shape-changing" in a way. I assume @sayantn knows more about this.
__tile1024i
type seems to be passed both by-value and handled by-pointer, for a typical signature looking like this:
fn some_tile_intrinsic(dst: &mut __tile1024i, src_a: __tile1024i, src_b: __tile1024i)
Some CPU architectures have developed "matrix extensions". These are sometimes equivalent to "vectors, but bigger" in terms of how the ABI should be handled (reusing the same architectural state, thus having similar concerns). But not always! They may use entirely different architectural state, usually entirely "caller-save" (i.e. always "volatile" or "call-clobbered").
AArch64
Scalable Matrix Extensions
PowerPC
MMA
x86
AMX
amx_tile
type, AKAx86_amx
or__tile1024i
References