[RFC] Relax the canonical order checking of arch string for -march option

kito-cheng commented 3 years ago

Changes

2021/06/03
- Add an new option d.
  
  The ISA spec has specify the ISA extension could be describe the arch string, and toolchain are using this way to control the code gen, and currently toolchain are following the order describe by the ISA spec, and emit error if order is wrong.

However we have bunch of sub-extension from several ISA extension now, zv, zb, zk, zp, zf and zi, and there is x, s and zxm* extensions, the order rule is also little bit different:

Order between z, s, h, zxm and x are: z -> s -> h -> zxm -> x
Order rule for z*: they should be ordered first by category (second letter), then alphabetically within a category
Order rule for h, s, zxm and x: ordered alphabetically.

Okay, so how to describe a arch has rv64gc, sfoo, zba, zfh, zicsr, zxmbar, xsf and xabc? The canonical order is there: rv64gc_zicsr_zfh_zba_sfoo_zxmbar_xabc_xsf. I think it's hard to write it correct without check with ISA spec.

So my proposal is relax the canonical order checking for the -march option,

And I list few possible options:

a. Relax order check for multi-letter extension, e.g. order between z, s, h, zxm and x. [1] b. Relax order check for whole arch string, including single letter extensions. [*1]

e.g. RV32MAFDI is OK

c. Don't relax order check, but emit an error to tell user the right order. d. Relax order check for whole arch string, including single letter extensions. [*1] except I, G, E.
- e.g.
- RV32IMAFD == RV32IAFDM
- RV32MAFDI is not OK.
- RV32I_ZAM # OK
- RV32IM_ZA #OK
- RV32I_ZA_M #OK
- RV32IZAM # Not OK multi-letter must start with _

*1: The ELF attribute emission still must in canonical order to simplify the parser logic, that's also keep the possibility of cache the checking result if we check that in OS or runtime in future.

luismarques commented 3 years ago

Thanks for writing down this proposal Kito. I like the idea that tools that consume human-generated input would be flexible in the order they would accept, but that they would always canonicalize the ISA string. So when a tool prints something it's always in canonical order and you can easily match that string. But why should the human have to remember that order? One possible exception that was mentioned in the LLVM RISC-V sync-up call was requiring that the i would always come first, as that's very integral to the rv32i vs rv64i distinction, but maybe that's overthinking it. (The same logic would presumably apply to g).

jim-wilson commented 3 years ago

I think the main problem is with the z and x extensions, which have unobvious orders. The single letter extensions are pretty clear about the right ordering. I'm Ok with relaxing the order check for user input for the multicharacter extensions.

pz9115 commented 3 years ago

Nice idea. I often met this problem in develop and test. Since we usually use 'imafdc' in general, it's not easily always type other extensions in right sequence. Relaxing the order that not frequently is real helpful for both user and developer.

ebahapo commented 3 years ago

Option D. As it was mentioned in the GCC call, the interpretation of the string should always be after the canonical order. If anything, to future proof it. When displaying the string, wherever in the toolchain, it should always be in the canonical order, whatever the user input. Also, the shorthand extension G should preferably be displayed, instead of spelling out its components.

kenta2 commented 3 years ago

I support option c (Don't relax order check, but emit an error to tell user the right order). It's not just the GNU toolchain (and LLVM) that is emitting and parsing ISA strings. I've written helper scripts that help users find the right ISA string for the features they have or want, and having a specification that says There Is Only One Correct Answer makes things easier to do such tasks.

tclin914 commented 3 years ago

Option a. Does it have another option that just relaxes the order in the same multi-letter extension?

rjiejie commented 3 years ago

Option b or d both are better, it let user easy to use this 'march' option without any complex detail of SPEC and tools like GCC or LLVM is responsible for hiding the details inside.

Geng-Qi-alibaba commented 3 years ago

I prefer option d. It's is flexible and reads clearly.

jrtc27 commented 3 years ago

For: Dynamically building up arch strings is awkward (e.g. you can't take rv64gc and then append q, because the order is rv64gqc) if you're taking in user input and appending

Against: Comparing arch strings visually becomes much harder if they're not in canonical order

kito-cheng commented 3 years ago

Thanks all your feedback, sound like most people are happy to relax the canonical order checking, statistics so far:

Option	Voting
a	@jim-wilson, @tclin914 @kito-cheng
b	@rjiejie @pz9115
c	@kenta2
d	@luismarques, @ebahapo @rjiejie @Geng-Qi @pz9115

kito-cheng commented 3 years ago

@palmer-dabbelt has told me he is fine to relax the order, but we should take care about the ambiguous of interpreting the arch string, especially for the multi-letter and single-letter extension.

e.g.

rv32i_a_zm # i + zm +a, no ambiguous.
rv32i_zma # i + zm +a or i + zma?

So I think we should write down few more explicit rule, like multi-letter extension must using underline between all adjacent extension.

nick-knight commented 3 years ago

@kito-cheng proposes "option (d)":

RV32IZAM # OK RV32IZAM # Not OK multi-letter must start with

I am against adding constraints that aren't in the (official) ISA string specification (Vol I Ch. 25). In this case, RV32IZam and RV32I_Zam are both legal, so mandating the latter is problematic in my opinion. So I'm strongly opposed to option (d).

I am (mildly) in favor of option (c): do not relax the specification. As a toolchain user I have never found the law to be particularly onerous. My vote is motivated by concern of a (hypothetical) situation where a user becomes accustomed to writing illegal ISA strings, switches to a tool which enforces the law, and then files a (spurious) bug report, wasting some law-abiding developer's time. (Spare the rod, spoil the child?) This is a subjective argument, and certainly not a hill I will die on.

kito-cheng commented 3 years ago

@nick-knight I think the details could be discussed, since the intention is preventing paring ambiguous.

My vote is motivated by concern of a (hypothetical) situation where a user becomes accustomed to writing illegal ISA strings, switches to a tool which enforces the law, and then files a (spurious) bug report, wasting some law-abiding developer's time. (Spare the rod, spoil the child?) This is a subjective argument, and certainly not a hill I will die on.

That sounds a valid concern, when we relax that, then people are used to using relaxed order, but I still concern about the order for multi-letter extension - it's really hard to sort, so I change my mind (back) to option a.

kito-cheng commented 3 years ago

Hey guys, thank all your feedback, now I created a PR https://github.com/riscv/riscv-toolchain-conventions/pull/14 for this RFC, welcome any further comment on that PR :)

cmuellner commented 3 years ago

In today's GCC RISC-V call we discovered the concern that a future ISA string format might cause a conflict. Addressing that in the specification (e.g. by relaxing the order there, or by getting hard guarantees there which can be relied upon) seemed unlikely.

One way that would still be possible is to do something similar like the push_arch proposal in the RISC-V Assembly Programmer's Manual E.g. -march=rv64gc -march-append=zicsr.

Nelson1225 commented 3 years ago

Hey guys, thank all your feedback, now I created a PR #14 for this RFC, welcome any further comment on that PR :)

I agree with @palmer-dabbelt and @nick-knight that the option d will cause more ambiguous problems. It doesn't worth that spending times to resolve these ambiguous problems caused by option d. The PR #14 seems like the option b, but add a restriction that the multi-letter extension must come after single-letter extension. I like this idea, and it should be convenient enough to users, and also easy to implement without changing the current ISA spec. If there is no other concern, I would like to update the binutils according to the PR #14 recently.

fanghuaqi commented 2 years ago

In today's GCC RISC-V call we discovered the concern that a future ISA string format might cause a conflict. Addressing that in the specification (e.g. by relaxing the order there, or by getting hard guarantees there which can be relied upon) seemed unlikely.

One way that would still be possible is to do something similar like the push_arch proposal in the RISC-V Assembly Programmer's Manual E.g. -march=rv64gc -march-append=zicsr.

I think this is a good way to specify march string, if we keep adding new risc-v extension, this march string will be too long, and a little ugly to me, but considering library built for different arch, it will also be a disaster.

riscv-non-isa / riscv-toolchain-conventions

[RFC] Relax the canonical order checking of arch string for -march option #11

Changes

Add an new option d.