riscv-non-isa / riscv-c-api-doc

Documentation of the RISC-V C API
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
69 stars 39 forks source link

[RFC] Preprocessor definition to specify data alignment strict(ness) #32

Closed vineetgarc closed 1 year ago

vineetgarc commented 1 year ago

I would like to propose a compiler generated preprocessor macro to signal data alignment heuristics used by the compiler for codegen. This could be used by library writers to target code for efficient unaligned access (or not). Granted we would like to eventually use runtime probing to figure this out, it would help the user to know what the compiler assumptions are anyways (or if user coaxed it into a certain semantics via additional toggles)

At a high level it reflects whether gcc toggle -mstrict-align has been used to build.

Speaking of gcc there' another wrinkle to worry about. gcc driver has a notion of cpu tune param which specifies whether unaligned accesses are generally efficient on the cpu (or not). And a cpu tune with slow_unaligned_access=true will disregard -mno-strict-align, compiler codegen assuming as if -mstrict-align was passed. So the proposed preprocessor macro needs to also reflect this.

Here's the specification: __riscv_strict_align (only defined if -mstrict-align or cpu tune param slow_unaligned_access=true) Value 1: if -mstrict-align value 2: if cpu tune param specifies slow_unaligned_access = true

vineetgarc commented 1 year ago

FWIW there's a gcc patch posted along those lines https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610115.html

asb commented 1 year ago

Thanks for opening this discussion. One thing I'm wondering about is whether it makes sense to couple this primarily to -mstrict-align / -mno-strict-align.

The RISC-V profiles proposal finally introduces a way to indicate whether misaligned loads/stores are supported or not (Zicclsm, which once supported by compilers should be queryable via __riscv_zicclsm as for any other extension).

Once you have that capability, might it be more intuitive to have a define dedicated solely to whether misaligned loads/stores are slow or fast? Not defined if Zicclsm isn't present, 1 if slow (potentially also if -mstrict-align was used, taking that as a hint that such accesses should be considered slow), 0 if not slow.

I'm wary of derailing this by trying to over-generalise it. But it's probably also worth at least considering if we'd like some naming scheme that reflected the extension that tuning hint is related to. e.g. __riscv_zicclsm_slow.

vineetgarc commented 1 year ago

Thanks for opening this discussion. One thing I'm wondering about is whether it makes sense to couple this primarily to -mstrict-align / -mno-strict-align.

As specified, there's more information provided to user, they can choose to not use it. If a user cared only about -mstrict-align they could just do a #ifdef flag w/o caring for actual value.

The RISC-V profiles proposal finally introduces a way to indicate whether misaligned loads/stores are supported or not (Zicclsm, which once supported by compilers should be queryable via __riscv_zicclsm as for any other extension).

My read on Zicclsm is, that a cpu could be Zicclsm complaint despite very slow unaligned accesses as long as it supported them natively. The intent here is to allow riscv common library code to make that distinction, specifically how the code was built. We want to specify if unaligned access is prohibitive, vs. allowed but not desirable vs. as good as regular accesses. Granted I would personally settle for binary strict aligned vs. not semantics, but it seems tooling (gcc at least) already supports a finer grained distinction.

Once you have that capability, might it be more intuitive to have a define dedicated solely to whether misaligned loads/stores are slow or fast? Not defined if Zicclsm isn't present, 1 if slow (potentially also if -mstrict-align was used, taking that as a hint that such accesses should be considered slow), 0 if not slow.

Sure, that's one possible way to go about this. This came up as someone was already starting to write glibc code which needed this build time distinction. FWIW I wasn't aware of this extension and/or that it is still being discussed and wha the times lines of ratification are etc.

I'm wary of derailing this by trying to over-generalise it. But it's probably also worth at least considering if we'd like some naming scheme that reflected the extension that tuning hint is related to. e.g. __riscv_zicclsm_slow.

Personally I'd like to keep them separate (with people throwing additional defines if needed to) as in this case the extension driving this doesn't really matter as much vs. the actual codegen semantics.

asb commented 1 year ago

We want to specify if unaligned access is prohibitive, vs. allowed but not desirable vs. as good as regular accesses.

Assuming you meant "prohibited", I think my point is that getting the prohibited or not bit of information should be doable via probing zicclsm, so this define could be recast to focus just on the undesirable (slow) vs as good as regular access (fast) distinction.

That said, I don't know whether zicclsm being added to the RISC-V vocabulary is blocked on the profiles work as a whole being approved. If it does indeed look stuck, then I have a lot of sympathy for moving ahead with something that ignores it for pragmatic reasons, even if it is a shame that we end up with overlapping ways of expressing the same information.

vineetgarc commented 1 year ago

Sounds good to me. It would be nice to have some clarity about Zicclsm from higher powers that may be.

aswaterman commented 1 year ago

I think the profiles spec says it all, and @vineetgarc has summarized it accurately. Of Zicclsm, it says, "Even though mandated, misaligned loads and stores might execute extremely slowly. Standard software distributions should assume their existence only for correctness, not for performance."

This means to me that __riscv_zicclsm is not the right name for a macro that suggests misaligned accesses should be routinely used. __riscv_zicclsm_slow is denotatively much more accurate, but it has the downside of sounding pejorative. That's why I suggested something to the effect of __riscv_avoid_misaligned on the mailing list. At the moment, I don't have a better suggestion that incorporates both the Zicclsm name but avoids the bad connotation.

aswaterman commented 1 year ago

How about inverting the polarity and defining riscv_zicclsm_fast when the extension is present and presumed to be fast? This has a few advantages: it avoids the pejorative description of the slow variants, and it means the misaligned-optimized routines only need to check one macro, rather than checking both riscv_zicclsm and !__riscv_zicclsm_slow.

Obviously, to remain consistent with the RVA profile spec, we need to arrange that the compilers only define __riscv_zicclsm_fast when explicitly tuning for a specific uarch. The default needs to be for this macro to remain undefined.

kito-cheng commented 1 year ago

Maybe just decouple with zicclsm? and made a more explicitly/specific macro name?


__riscv_unaligned_access

vineetgarc commented 1 year ago

Maybe just decouple with zicclsm? and made a more explicitly/specific macro name?

+1

__riscv_unaligned_access Undefined: Unknown 0: Unsupported 1: Support, but not guarantee fast or slow. 2: Slow 3: Fast 4: Very fast

This seems like an overkill, considering what the end user of this define (library code writer) will typically target. Moreover this needs to tie back to gcc codegen facilities so unless gcc has toggles for such fine granularity it won't know what value to generate (unless we invent new toggles). Anyhow the intention behind all of this was to indicate what compiler is codegen for (default or forced via explicit toggles) vs. a runtime characteristic which is better dynamically probed.

How about following

asb commented 1 year ago

This means to me that __riscv_zicclsm is not the right name for a macro that suggests misaligned accesses should be routinely used.

Just to clarify, that isn't what I was suggesting. I was pointing out that with zicclsm we should already have a way of determining if misaligned loads are supported, which suggests a separate define can instead focus on if they're fast or not. So something like __riscv_zicclsm (already there due to the standard definitions for supported extension) and __riscv_zicclsm_fast appeals to me.

But...opinions can vary ,and the continued success of RISC-V isn't going to depend on whether you have defines with slightly overlapping definitions etc etc. So if there's consensus around some other option I'm not going to drag out the discussion.

aswaterman commented 1 year ago

I think we’re actually in agreement.

vineetgarc commented 1 year ago

The naming of options is really not important to me at least. We just need to find a mapping from -mstrict-align and/or gcc cpu tune param slow_unaligned_access to these defines.

__riscv_zicclsm could just be a neutral artifact of corresponding -march option (but no actual alignment semantics ? ) __riscv_zicclsm_fast maps well to gcc cpu tune param slow_unaligned_access=false.

The question is what does -mstrict-align do to these and/or define yet something new.

asb commented 1 year ago

Perhaps if -mstrict-align is passed, then __riscv_zicclsm_fast is unset, on the basis that the user probably wanted to override any assumption that misaligned accesses are cheap to use.

vineetgarc commented 1 year ago

Do we keep different values for __riscv_zicclsm_fast for -mno-strict-align vs. cpu tune slow_unaligned_access=false ?

aswaterman commented 1 year ago

I think it's just __riscv_zicclsm <=> -mno-strict-align

and

__riscv_zicclsm_fast <=> -mno-strict-align && slow_unaligned_access=false

i.e. just booleans.

vineetgarc commented 1 year ago

think it's just __riscv_zicclsm <=> -mno-strict-align

Do note that users typically don't specify -mno-strict-align

__riscv_zicclsm_fast <=> -mno-strict-align && slow_unaligned_access=false

This would also be the case if -mno-strict-align is not specified with just slow_unaligned_access=false ?

One more point: slow_unaligned_access=true is treated same as -mstrict-align (which actually is the case even today, but I was hoping if the define could make this distinction).

aswaterman commented 1 year ago

This would also be the case if -mno-strict-align is not specified with just slow_unaligned_access=false ?

One more point: slow_unaligned_access=true is treated same as -mstrict-align (which actually is the case even today, but I was hoping if the define could make this distinction).

Seems like this is getting far into the GCC implementation details. The important point is that __riscv_zicclsm_fast should be defined only if misaligned accesses are both legal and high performance.

aswaterman commented 1 year ago

Do note that users typically don't specify -mno-strict-align

I know. It's the default. I don't think it changes what I wrote.

kito-cheng commented 1 year ago

I support your proposal for this version:

riscv_unaligned_bad only define if -mstrict-align riscv_unaligned_slow only define if cpu tune param has slow_unaligned_access=true __riscv_unaligned_fast for fast unaligned case (or maybe even drop this

Personally I would like to prevent zicclsm since it's kind of not clear to user, but it's not strongly opinion.

@vineetgarc could you prepare a PR? I am happy to help pushing this forward :)

vineetgarc commented 1 year ago

I could but it seems world has moved on :-) Rather than baking this at compile time, dynamic probe is starting to trickle in.

kito-cheng commented 1 year ago

I thought that still could help people to write some multi version code?

vineetgarc commented 1 year ago

So this proposal relies on compiler supporting a cpu tune param which can override -mno-strict-align. gcc supports that I'm not sure if llvm does ? And if not, how do we incorporate that into the common c-api doc ? Would it be left as "implementation defined". This is specially important since the default value (in lack of an explicit cmdline toggle) is derived from cpu tune param

kito-cheng commented 1 year ago

My thought is we only define the mean of the macro, not define the implementation detail, like:

e.g. __riscv_unaligned_bad: do not doing unaligned access, it might cause trap or extremely slow. __riscv_unaligned_slow: unaligned access is much slow than aligned access. __riscv_unaligned_fast: unaligned case is fast.

vineetgarc commented 1 year ago

Here's the pull request for same: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/40

vineetgarc commented 1 year ago

This was merged upstream as https://github.com/riscv-non-isa/riscv-c-api-doc/commit/a98e3099ffc6d8207dd642a2c181d741878908db