riscv-non-isa / riscv-elf-psabi-doc

A RISC-V ELF psABI Document
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
705 stars 163 forks source link

Clarification on canonical representation for Tag_RISCV_arch #203

Open cmuellner opened 3 years ago

cmuellner commented 3 years ago

The attribute Tag_RISCV_arch is defined as a valid argument for -march to describe the ISA extension as defined in the ISA extension naming convention, which is part of the RISC-V unpriv specification.

I propose to either require a canonical representation to be stored there (I am aware that there is no defined canonical form atm), or to add a comment to not expect a canonical representation in that string (i.e. a user of the field needs to parse the string before further processing because two different strings might be semantically identical).

Related issues (with some examples): https://github.com/riscv/riscv-toolchain-conventions/issues/11 https://github.com/riscv/riscv-isa-manual/issues/670

jrtc27 commented 3 years ago

We already require some amount of canonicalisation:

Note that the version information for target architecture must be presented explicitly in the attribute and abbreviations must be expanded. The version information, if not given by -march, must agree with the default specified by the tool. For example, the architecture RV32I has to be recorded in the attribute as RV32I2P0 in which 2P0 stands for the default version of its based ISA. On the other hand, the architecture RV32G has to be presented as RV32I2P0_M2P0_A2P0_F2P0_D2P0 in which the abbreviation G is expanded to the IMAFD combination with default versions of the standard extensions.

I guess your issue is with a literal strcmp vs having to split on _ and do set comparisons? Is there a situation where a strcmp is actually useful though? I'd expect some amount of parsing to be required to work out if it's a subset of the current environment (in the loader case) or if the union of two strings is a valid arch string (in the static linker case to detect conflicting extensions).

jrtc27 commented 3 years ago

i.e. the only thing we don't currently impose is canonicalisation on the order of the non-single-letter extensions (i.e. nothing beyond what the ISA unprivileged spec says)

kito-cheng commented 3 years ago

ISA spec isn't clear enough for following case:

rv32iv vs rv32iv_zvlsseg rv32if_zfh vs rv32if_zfh_zfhmin rv32i_zkn vs rv32i_zbkb_zbkc_zbkx_zknd_zknh

In theory they are equal on left and right side, but ... what is one canonical order for them? if both are canonical order, we must write down more detail rule there.

I prefer expanded and most verbose form (right side), it's less ambiguous during look-up some extensions are enabled or not, for example, the ISA string split into a list by _, and we want to check vector segment load store, if using rv32iv, we must check v or zvlsseg is there or not, but is we always expand to right side form, just check zvlsseg is enough, and that prevent the extension implication knowledge (e.g. v implied/included zvlsseg) need to appear everywhere.

Let see what happen on ISA spec side :p

cmuellner commented 3 years ago

Set comparisons only work, if extensions are composed within a trivial hierarchy. That is not guaranteed by the ISA spec. E.g. the proposal for Zce includes the following logic:

I don't mind requiring tools to parse the string, but I would like to see a note so that people are aware of that.

And if a canonical form is not required, then we could reconsider the following requirement: version information for target architecture must be presented explicitly in the attribute and abbreviations must be expanded. Versions can default to the ratified version. Expansions are not required (as strcmp would not work anyway).

jrtc27 commented 3 years ago

That requirement exists so that kernels and loaders don't all need to embed a full parser of the arch string, which was deemed a non-starter (and I would entirely agree with that, the rules are really complicated).

The Zce proposal sounds horrendous to me, that is a violation of sensible principles for extensions. Having rv32idzce not be a superset of rv32izce is all kinds of "god no don't do that". Extensions are extensions, period. Zceb must therefore be an explicit extension that's incompatible with D from my perspective.

The Zcea situation is completely fine. Any binary using C.MUL will have Zcea and either M or Zmmul in its attributes, so the requirements are entirely captured with no implication logic needed. If the hart has Zmmul (or full-blown M) then the binary will work. If the hart doesn't then the loader will see that and the binary won't be loaded.

asb commented 2 years ago

I feel the Tag_RISCV_arch is underspecified without additional guidance on what caonicalisation of the string is required when one extension implies another, and it would be worth seeking to address this in someway ahead of 1.0.