Open per opened 1 year ago
@bjacob, @banach-space
Great to see this gaining traction! 🚀
have the tile sizes be decided at runtime
Wouldn't this be enabled through parametric tiling? That's already supported by the tiling infra in Linalg and, AFAIK, "just works" ™️ :) (*) It hasn't been wired-up in IREE yet, but I hope to have this working in the coming week. That might solve this particular problem for you.
Having said that, I'm not familiar with the u-kernel lowering path, so might be missing something obvious. Perhaps you'd be enabling this elsewhere?
-Andrzej
(*) I experimented with that when preparing the scalable vec RFC, see scalable tiling.
@benvanik @MaheshRavishankar @hanhanW
Sounds like the kind of topic that needs a video call with all of us :-)
The complicated part is the dynamic tile size selection. I wonder if this project could be split into two stages: first work with compile-time-specified SVE vector length, implement the SVE ukernels in that context ; then tackle the dynamic vector length aspect.
There are a few things to untangle here. Would be great to sync up... I am going to be taking some personal time, but I can connect on Monday August 28th.
The complicated part is the dynamic tile size selection.
Just to clarify, SVE supports scalable vectors, but does not support "dynamic vectors" or "vector register grouping" - that's something that's available in the other CPU architecture that supports scalable vectors ;-)
Now, while the effective vector length is not known at compile time, it is known (and fixed) at run-time. So there is no "dynamism" here. This is crucial, because ultimately we only have to replace expressions like:
4
elements", with:4 * vscale
elements".Yes, we don't know the value of vscale
at compile time. However, we can still refer to it as any other SSA value (that's where "parametric" becomes important):
I wonder if this project could be split into two stages: first work with compile-time-specified SVE vector length, implement the SVE ukernels in that context ; then tackle the dynamic vector length aspect.
That's an option, but the first step would be no different to simply replacing NEON kernels with SVE, right? IMHO, we should be aiming for "scalability" in the first iteration of this. Ultimately, that's the key feature of SVE that we are trying to enable through scalable vectorisation (*). Also, once we consider SME as well (and, directly related to this, Streaming SVE), we will start mixing different vector sizes in one compilation. That's because the runtime value of vscale
will very likely differ between non-streaming and streaming SVE (this will depend on the actual implementation of SME).
I am going to be taking some personal time, but I can connect on Monday August 28th.
Also available on Monday, Tue (preferred) and Weds next week.
-Andrzej
(*) We can, of course, treat u-kernels and scalable vectorisation separately.
@bjacob Yes, agree. It can be split up in two parts, where the second part is the trickier one wrt to getting the tiling handling in place.
I'm also available Monday or Tuesday (after 6:30 CET) for a call.
Just to clarify,
Thanks a lot @banach-space for the explanation.
That's an option, but the first step would be no different to simply replacing NEON kernels with SVE, right? IMHO, we should be aiming for "scalability" in the first iteration of this. Ultimately, that's the key feature of SVE that we are trying to enable through scalable vectorisation (
*
). (*
) We can, of course, treat u-kernels and scalable vectorisation separately
Yes, that's what I had in mind: with the split, the first half of the project still allows writing the final ukernels code. It's only the compiler side that is tricky to make work with the runtime tile size --- so after the first stage is completed, the compiler still treats vscale as a compile-time constant, but the ukernels treat it as a runtime value, if you want, or whatever --- ukernels are easy to evolve and if you run into any friction with the current code, we can change anything.
Request description
For example AWS Graviton 3, based on Arm Neoverse-V1 CPUs, has support for SVE (Scalable Vector Extension). We want to add support for SVE ukernel and apart from the mmt4d kernel, also address the tiling to be decided based on the vector length. For the tiling the plan is to re-use parts of the mechanisms for the vmvx backend, to have the tile sizes be decided at runtime.
What component(s) does this issue relate to?
Compiler
Additional context
No response