Open cmuellner opened 3 years ago
We already require some amount of canonicalisation:
Note that the version information for target architecture must be presented explicitly in the attribute and abbreviations must be expanded. The version information, if not given by -march, must agree with the default specified by the tool. For example, the architecture RV32I has to be recorded in the attribute as RV32I2P0 in which 2P0 stands for the default version of its based ISA. On the other hand, the architecture RV32G has to be presented as RV32I2P0_M2P0_A2P0_F2P0_D2P0 in which the abbreviation G is expanded to the IMAFD combination with default versions of the standard extensions.
I guess your issue is with a literal strcmp vs having to split on _ and do set comparisons? Is there a situation where a strcmp is actually useful though? I'd expect some amount of parsing to be required to work out if it's a subset of the current environment (in the loader case) or if the union of two strings is a valid arch string (in the static linker case to detect conflicting extensions).
i.e. the only thing we don't currently impose is canonicalisation on the order of the non-single-letter extensions (i.e. nothing beyond what the ISA unprivileged spec says)
ISA spec isn't clear enough for following case:
rv32iv
vs rv32iv_zvlsseg
rv32if_zfh
vs rv32if_zfh_zfhmin
rv32i_zkn
vs rv32i_zbkb_zbkc_zbkx_zknd_zknh
In theory they are equal on left and right side, but ... what is one canonical order for them? if both are canonical order, we must write down more detail rule there.
I prefer expanded and most verbose form (right side), it's less ambiguous during look-up some extensions are enabled or not, for example, the ISA string split into a list by _
, and we want to check vector segment load store, if using rv32iv
, we must check v
or zvlsseg
is there or not, but is we always expand to right side form, just check zvlsseg
is enough, and that prevent the extension implication knowledge (e.g. v
implied/included zvlsseg
) need to appear everywhere.
Let see what happen on ISA spec side :p
Set comparisons only work, if extensions are composed within a trivial hierarchy.
That is not guaranteed by the ISA spec.
E.g. the proposal for Zce
includes the following logic:
Zce
implies Zceb
only if D
is not available.C.MUL
of Zcea
is only available if M
or Zmmul
is availableI don't mind requiring tools to parse the string, but I would like to see a note so that people are aware of that.
And if a canonical form is not required, then we could reconsider the following requirement:
version information for target architecture must be presented explicitly in the attribute and abbreviations must be expanded
.
Versions can default to the ratified version. Expansions are not required (as strcmp
would not work anyway).
That requirement exists so that kernels and loaders don't all need to embed a full parser of the arch string, which was deemed a non-starter (and I would entirely agree with that, the rules are really complicated).
The Zce proposal sounds horrendous to me, that is a violation of sensible principles for extensions. Having rv32idzce not be a superset of rv32izce is all kinds of "god no don't do that". Extensions are extensions, period. Zceb must therefore be an explicit extension that's incompatible with D from my perspective.
The Zcea situation is completely fine. Any binary using C.MUL will have Zcea and either M or Zmmul in its attributes, so the requirements are entirely captured with no implication logic needed. If the hart has Zmmul (or full-blown M) then the binary will work. If the hart doesn't then the loader will see that and the binary won't be loaded.
I feel the Tag_RISCV_arch is underspecified without additional guidance on what caonicalisation of the string is required when one extension implies another, and it would be worth seeking to address this in someway ahead of 1.0.
The attribute
Tag_RISCV_arch
is defined as a valid argument for-march
to describe the ISA extension as defined in the ISA extension naming convention, which is part of the RISC-V unpriv specification.I propose to either require a canonical representation to be stored there (I am aware that there is no defined canonical form atm), or to add a comment to not expect a canonical representation in that string (i.e. a user of the field needs to parse the string before further processing because two different strings might be semantically identical).
Related issues (with some examples): https://github.com/riscv/riscv-toolchain-conventions/issues/11 https://github.com/riscv/riscv-isa-manual/issues/670