Closed vineetgarc closed 1 year ago
FWIW there's a gcc patch posted along those lines https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610115.html
Thanks for opening this discussion. One thing I'm wondering about is whether it makes sense to couple this primarily to -mstrict-align / -mno-strict-align.
The RISC-V profiles proposal finally introduces a way to indicate whether misaligned loads/stores are supported or not (Zicclsm, which once supported by compilers should be queryable via __riscv_zicclsm
as for any other extension).
Once you have that capability, might it be more intuitive to have a define dedicated solely to whether misaligned loads/stores are slow or fast? Not defined if Zicclsm isn't present, 1 if slow (potentially also if -mstrict-align was used, taking that as a hint that such accesses should be considered slow), 0 if not slow.
I'm wary of derailing this by trying to over-generalise it. But it's probably also worth at least considering if we'd like some naming scheme that reflected the extension that tuning hint is related to. e.g. __riscv_zicclsm_slow
.
Thanks for opening this discussion. One thing I'm wondering about is whether it makes sense to couple this primarily to -mstrict-align / -mno-strict-align.
As specified, there's more information provided to user, they can choose to not use it. If a user cared only about -mstrict-align they could just do a #ifdef flag w/o caring for actual value.
The RISC-V profiles proposal finally introduces a way to indicate whether misaligned loads/stores are supported or not (Zicclsm, which once supported by compilers should be queryable via __riscv_zicclsm as for any other extension).
My read on Zicclsm is, that a cpu could be Zicclsm complaint despite very slow unaligned accesses as long as it supported them natively. The intent here is to allow riscv common library code to make that distinction, specifically how the code was built. We want to specify if unaligned access is prohibitive, vs. allowed but not desirable vs. as good as regular accesses. Granted I would personally settle for binary strict aligned vs. not semantics, but it seems tooling (gcc at least) already supports a finer grained distinction.
Once you have that capability, might it be more intuitive to have a define dedicated solely to whether misaligned loads/stores are slow or fast? Not defined if Zicclsm isn't present, 1 if slow (potentially also if -mstrict-align was used, taking that as a hint that such accesses should be considered slow), 0 if not slow.
Sure, that's one possible way to go about this. This came up as someone was already starting to write glibc code which needed this build time distinction. FWIW I wasn't aware of this extension and/or that it is still being discussed and wha the times lines of ratification are etc.
I'm wary of derailing this by trying to over-generalise it. But it's probably also worth at least considering if we'd like some naming scheme that reflected the extension that tuning hint is related to. e.g. __riscv_zicclsm_slow.
Personally I'd like to keep them separate (with people throwing additional defines if needed to) as in this case the extension driving this doesn't really matter as much vs. the actual codegen semantics.
We want to specify if unaligned access is prohibitive, vs. allowed but not desirable vs. as good as regular accesses.
Assuming you meant "prohibited", I think my point is that getting the prohibited or not bit of information should be doable via probing zicclsm, so this define could be recast to focus just on the undesirable (slow) vs as good as regular access (fast) distinction.
That said, I don't know whether zicclsm being added to the RISC-V vocabulary is blocked on the profiles work as a whole being approved. If it does indeed look stuck, then I have a lot of sympathy for moving ahead with something that ignores it for pragmatic reasons, even if it is a shame that we end up with overlapping ways of expressing the same information.
Sounds good to me. It would be nice to have some clarity about Zicclsm from higher powers that may be.
I think the profiles spec says it all, and @vineetgarc has summarized it accurately. Of Zicclsm, it says, "Even though mandated, misaligned loads and stores might execute extremely slowly. Standard software distributions should assume their existence only for correctness, not for performance."
This means to me that __riscv_zicclsm
is not the right name for a macro that suggests misaligned accesses should be routinely used. __riscv_zicclsm_slow
is denotatively much more accurate, but it has the downside of sounding pejorative. That's why I suggested something to the effect of __riscv_avoid_misaligned
on the mailing list. At the moment, I don't have a better suggestion that incorporates both the Zicclsm name but avoids the bad connotation.
How about inverting the polarity and defining riscv_zicclsm_fast when the extension is present and presumed to be fast? This has a few advantages: it avoids the pejorative description of the slow variants, and it means the misaligned-optimized routines only need to check one macro, rather than checking both riscv_zicclsm and !__riscv_zicclsm_slow.
Obviously, to remain consistent with the RVA profile spec, we need to arrange that the compilers only define __riscv_zicclsm_fast when explicitly tuning for a specific uarch. The default needs to be for this macro to remain undefined.
Maybe just decouple with zicclsm? and made a more explicitly/specific macro name?
__riscv_unaligned_access
When zicclsm present __riscv_unaligned_access will at least 1.
Maybe just decouple with zicclsm? and made a more explicitly/specific macro name?
+1
__riscv_unaligned_access Undefined: Unknown 0: Unsupported 1: Support, but not guarantee fast or slow. 2: Slow 3: Fast 4: Very fast
This seems like an overkill, considering what the end user of this define (library code writer) will typically target. Moreover this needs to tie back to gcc codegen facilities so unless gcc has toggles for such fine granularity it won't know what value to generate (unless we invent new toggles). Anyhow the intention behind all of this was to indicate what compiler is codegen for (default or forced via explicit toggles) vs. a runtime characteristic which is better dynamically probed.
How about following
This means to me that
__riscv_zicclsm
is not the right name for a macro that suggests misaligned accesses should be routinely used.
Just to clarify, that isn't what I was suggesting. I was pointing out that with zicclsm we should already have a way of determining if misaligned loads are supported, which suggests a separate define can instead focus on if they're fast or not. So something like __riscv_zicclsm
(already there due to the standard definitions for supported extension) and __riscv_zicclsm_fast
appeals to me.
But...opinions can vary ,and the continued success of RISC-V isn't going to depend on whether you have defines with slightly overlapping definitions etc etc. So if there's consensus around some other option I'm not going to drag out the discussion.
I think we’re actually in agreement.
The naming of options is really not important to me at least. We just need to find a mapping from -mstrict-align and/or gcc cpu tune param slow_unaligned_access to these defines.
__riscv_zicclsm could just be a neutral artifact of corresponding -march option (but no actual alignment semantics ? ) __riscv_zicclsm_fast maps well to gcc cpu tune param slow_unaligned_access=false.
The question is what does -mstrict-align do to these and/or define yet something new.
Perhaps if -mstrict-align
is passed, then __riscv_zicclsm_fast
is unset, on the basis that the user probably wanted to override any assumption that misaligned accesses are cheap to use.
Do we keep different values for __riscv_zicclsm_fast for -mno-strict-align vs. cpu tune slow_unaligned_access=false ?
I think it's just __riscv_zicclsm
<=> -mno-strict-align
and
__riscv_zicclsm_fast
<=> -mno-strict-align && slow_unaligned_access=false
i.e. just booleans.
think it's just
__riscv_zicclsm
<=>-mno-strict-align
Do note that users typically don't specify -mno-strict-align
__riscv_zicclsm_fast
<=>-mno-strict-align && slow_unaligned_access=false
This would also be the case if -mno-strict-align
is not specified with just slow_unaligned_access=false
?
One more point: slow_unaligned_access=true
is treated same as -mstrict-align
(which actually is the case even today, but I was hoping if the define could make this distinction).
This would also be the case if
-mno-strict-align
is not specified with justslow_unaligned_access=false
?One more point:
slow_unaligned_access=true
is treated same as-mstrict-align
(which actually is the case even today, but I was hoping if the define could make this distinction).
Seems like this is getting far into the GCC implementation details. The important point is that __riscv_zicclsm_fast
should be defined only if misaligned accesses are both legal and high performance.
Do note that users typically don't specify
-mno-strict-align
I know. It's the default. I don't think it changes what I wrote.
I support your proposal for this version:
riscv_unaligned_bad only define if -mstrict-align riscv_unaligned_slow only define if cpu tune param has slow_unaligned_access=true __riscv_unaligned_fast for fast unaligned case (or maybe even drop this
Personally I would like to prevent zicclsm
since it's kind of not clear to user, but it's not strongly opinion.
@vineetgarc could you prepare a PR? I am happy to help pushing this forward :)
I could but it seems world has moved on :-) Rather than baking this at compile time, dynamic probe is starting to trickle in.
I thought that still could help people to write some multi version code?
So this proposal relies on compiler supporting a cpu tune param which can override -mno-strict-align. gcc supports that I'm not sure if llvm does ? And if not, how do we incorporate that into the common c-api doc ? Would it be left as "implementation defined". This is specially important since the default value (in lack of an explicit cmdline toggle) is derived from cpu tune param
My thought is we only define the mean of the macro, not define the implementation detail, like:
e.g.
__riscv_unaligned_bad
: do not doing unaligned access, it might cause trap or extremely slow.
__riscv_unaligned_slow
: unaligned access is much slow than aligned access.
__riscv_unaligned_fast
: unaligned case is fast.
Here's the pull request for same: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/40
This was merged upstream as https://github.com/riscv-non-isa/riscv-c-api-doc/commit/a98e3099ffc6d8207dd642a2c181d741878908db
I would like to propose a compiler generated preprocessor macro to signal data alignment heuristics used by the compiler for codegen. This could be used by library writers to target code for efficient unaligned access (or not). Granted we would like to eventually use runtime probing to figure this out, it would help the user to know what the compiler assumptions are anyways (or if user coaxed it into a certain semantics via additional toggles)
At a high level it reflects whether gcc toggle -mstrict-align has been used to build.
Speaking of gcc there' another wrinkle to worry about. gcc driver has a notion of cpu tune param which specifies whether unaligned accesses are generally efficient on the cpu (or not). And a cpu tune with slow_unaligned_access=true will disregard -mno-strict-align, compiler codegen assuming as if -mstrict-align was passed. So the proposed preprocessor macro needs to also reflect this.
Here's the specification: __riscv_strict_align (only defined if -mstrict-align or cpu tune param slow_unaligned_access=true) Value 1: if -mstrict-align value 2: if cpu tune param specifies slow_unaligned_access = true