asb commented 2 years ago

I'm about to post patches to enable MC layer support for Zicbo{m,z,p} in LLVM. One point that isn't clear is whether the cbo.* instructions should be written like e.g. cbo.clean (a0)/cbo.clean 0(a0), matching the A extension instructions that take a memory address in a register with no immediate offset (e.g. lr.w t0, (t1)), or just plain cbo.clean a0. As @aswaterman pointed out on the binutils list, there's also precedent for not having parentheses (sfence.vma).

Possibly this is outside the scope of the CMO group and something that those of us in the LLVM community just need to sync with the binutils/GCC folks on. What do you think?

CC @kito-cheng who's also been helping coordinate aligment between LLVM and GCC/binutils.

jrtc27 commented 2 years ago

Arguably, sfence.vma operates on the reference (which can affect the data as a side-effect), whereas cbo.* operates on the data, so that would be a justification for why sfence.vma doesn't have parentheses.

asb commented 2 years ago

Just a ping to see if anyone had thoughts on this one way or another?

aswaterman commented 2 years ago

I’ll support @jrtc27’s take. It’s a minor point either way, but there’s little reason to linger on this topic.

asb commented 2 years ago

No argument from me that in a priority ordered list of RISC-V issues, this probably comes out pretty low! I'd just like to get the assembler support done without having to revisit it in the future.

Following this discussion, I've modified my LLVM MC layer zicbom+zicboz patch to use the (rs1)/0(rs1) format. Thanks!

cmuellner commented 2 years ago

There are two Binutils patches from January on the Binutils list, that use the plain format (cbo.clean a0). E.g. https://sourceware.org/pipermail/binutils/2022-January/119273.html However, no patch has been merged so far.

I don't have any preference, except that I prefer a common behavior of Binutils/GCC and LLVM.

a4lg commented 2 years ago

I have a little preference for old format (cbo.zero a0) but it's not a strong one.

Anyway, we have two options:

[OLD] Based on Ratified operand format:
- https://github.com/a4lg/binutils-gdb/tree/riscv-CMOs
- https://sourceware.org/pipermail/binutils/2022-January/119270.html
[NEW] Based on Proposed operand format:
- https://github.com/a4lg/binutils-gdb/tree/riscv-CMOs-with-paren
- https://sourceware.org/pipermail/binutils/2022-February/119708.html

a4lg commented 2 years ago

I prefer old one because:

It's not good time to change operand format (after ratification)
Because offset must be zero, we cannot handle cbo.* instruction just like regular memory instructions.
Because of 2., we will not have big advantage over the old one

But again, this preference is not so strong.

jrtc27 commented 2 years ago

Because offset must be zero, we cannot handle cbo.* instruction just like regular memory instructions.

Because of 2., we will not have big advantage over the old one (as https://github.com/riscv/riscv-CMOs/issues/47#issuecomment-1024144629)

This is no different from the AMOs, which use parentheses.

a4lg commented 2 years ago

@jrtc27 OK, you are right (I completely forgot about that). And I also misinterpreted the intention of @aswaterman's comment.

Anyway, I have no strong opinion about this and... sooner is better.

jrtc27 commented 2 years ago

For what it's worth, this is a perfect example of why having toolchain patches be a requirement for ratification is a really good idea..

a4lg commented 2 years ago

GNU Binutils adopted textual format '(rs1)' and '0(rs1)'. Textual format 'rs1' is explicitly rejected. https://sourceware.org/pipermail/binutils/2022-March/120111.html https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=41d6ac5da655a2e78109848f2db47e53552fd61a

+/* Zicbom and Zicboz instructions.  */
+{"cbo.clean",  0, INSN_CLASS_ZICBOM, "0(s)", MATCH_CBO_CLEAN, MASK_CBO_CLEAN, match_opcode, 0 },
+{"cbo.flush",  0, INSN_CLASS_ZICBOM, "0(s)", MATCH_CBO_FLUSH, MASK_CBO_FLUSH, match_opcode, 0 },
+{"cbo.inval",  0, INSN_CLASS_ZICBOM, "0(s)", MATCH_CBO_INVAL, MASK_CBO_INVAL, match_opcode, 0 },
+{"cbo.zero",   0, INSN_CLASS_ZICBOZ, "0(s)", MATCH_CBO_ZERO, MASK_CBO_ZERO, match_opcode, 0 },

"0(s)" means proposed textual format (rs1) and 0(rs1) (as opposed to original textual operands "s").

a4lg commented 2 years ago

To whom it may concern, (Cc: @asb @aswaterman @jrtc27 @nelson-chu @kito-cheng @dkruckemyer-ventana)

It should be a good time to make a resolution. Although that one of my patchsets (with paren, textual form as proposed by @asb as a LLVM patch) is merged into GNU Binutils' master branch, it should not be too late to revert it. But if we (RISC-V CMO authors and GNU/LLVM toolchain developers) don't coordinate now, we will risk making incompatibilities.

I admit that GNU Binutils has an option to support both, but I think it causes problems on long-term (because one of textual forms must be an alias). So, I designed two patchsets reject another textual form.

a4lg commented 2 years ago

49 is created to reflect textual form proposed by @asb (to be merged ONLY IF a resolution is made on this issue).

If we decide to change textual form, I propose new document version v1.0.1 to keep extension version (Zicbom1p0 and Zicboz1p0) as the new document will be still binary-compatible.

aswaterman commented 2 years ago

I’ll just briefly note that I’m happy with any decision we make here, as long as we’re consistent across the toolchains.

a4lg commented 2 years ago

So am I.

asb commented 2 years ago

I'm happy with (rs1) and 0(rs1).

a4lg commented 2 years ago

Both (GNU/LLVM) toolchain developers agreed. Remaining task: ask RISC-V CMO TG? tech-cmo@lists.riscv.org for request to change?

dkruckemyer-ventana commented 2 years ago

@a4lg I have no problem with the change, though I wonder whether the format should instead be inst offset(rs1)? In this case, the descriptive text would be written so that anything other than offset=0 is UNSPECIFIED.

I'm trying to anticipate a future extension that might introduce a non-zero offset, which would probably require a different instruction encoding but would have the same instruction semantics.

a4lg commented 2 years ago

@dkruckemyer-ventana

TL;DR: offset(rs1) looks good (for other reason than you provided).

I checked AMO instructions but they have no mention about textual format. If we need to mention about that, we need to invent something new.

I noticed that we need to explain value of offset since this is valid on GNU Binutils (master):

cbo.zero 2*3-6(t0)

In above example, offset is zero. However, it's not just a plain constant. But it's a constant expression that happened to be zero. So, explaining constraints to offset looks good to me. Also, this is valid, too.

cbo.zero (t0)

Proposal

How about this?

Mnemonic::
cbo.zero offset(_base_)

...

Optional *offset* must be evaluated to zero (if exists). Otherwise, such encoding is UNSPECIFIED.

I'm not a native English speaker so proofreading is welcome.

Also, I'm not sure whether the word UNSPECIFIED is appropriate here (if nonzero offset variant doesn't exist).

palmer-dabbelt commented 2 years ago

Looks like some support for these landed in binutils as 41d6ac5da65 ("RISC-V: Cache management instructions"). It's not in the release, so it's not technically a 100% stable interface yet, but we generally try to avoid reverting things. Sounds like there's mostly consensus here, so I just wanted to poke folks and see what's up -- don't want to force things here (and I don't personally care at all about the syntax).

dkruckemyer-ventana commented 2 years ago

Just wanted to let everyone know I haven't forgotten about this. Will try to get to this in the next week or so.

yulong18 commented 2 years ago

How can I implement the textual format of cbo.* asm operands in the gcc part? Not sure which style should I follow, Any suggestions?

AndyGlew commented 2 years ago

This may be long past caring for y'all, but at one point ( as in, original proposal) we discussed having addressing modes for some CBOs that were likely to be used inside loops that did other computation, not just loops that were used to perform a CMO on an entire memory region. And in such loops there are advantages to having addressing modes comparable to those of the regular load and store instructions. This may no longer be the case - which IMHO will probably be the kiss of death for using CBOs to do cache management as loops progress in numerical code, the overhead of extra address computation often outweighs the benefit unless you are lucky enough to get away without any offsets - but I think that this possibility shows the advantage of having a uniform addressing mode notation, CBO.CLEAN (a1), for consistency in case there is ever created a CBO.XXX offset(a1) for some future XXX.

Note: (a1) if the addressing mode is registering direct only with no offset, i.e. M[a1]

I think that 0(a1) implies incorrectly that there would be an offset, and I would only support that encasing on zero offset is ever supported.

dkruckemyer-ventana commented 2 years ago

@a4lg I took your suggestion and modified it a bit:

cbo.zero _offset_(_base_) (asciidoc formatting)

...

A cbo.zero instruction performs stores of zeros to the full set of bytes corresponding to the cache block whose effective address is specified in rs1. The instruction encoding does not include an immediate field, so offset is optional and shall evaluate to zero if present. An implementation may or may not update the entire set of bytes atomically.

Sound OK?

gfavor commented 2 years ago

If the offset field does not evaluate to zero, is that an assembler error? I haven't checked but I suspect the instruction definition requires the offset field to be zero (i.e. non-zero offset encodings are reserved for future potential use). In other words, the offset field is not architecturally defined to be ignored and is not currently allowed to have a non-zero value.

cmuellner commented 2 years ago

Yes, in the current implementation, a non-zero offset of a cbo.* instruction triggers the assembler error "Error: illegal operands".

This is also tested as part of Binutil's internal regression test suite. Here is the link for the valid forms (according to the current implementation): https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gas/testsuite/gas/riscv/zicboz.s. And here is a link for the invalid forms (according to the current implementation): https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gas/testsuite/gas/riscv/zicboz-fail.s.

dkruckemyer-ventana commented 2 years ago

As I'm thinking about this some more, it's not clear what requirement needs to be captured in the ISA specs. A non-zero offset cannot be encoded in the instruction, but is there any ISA requirement that the mnemonic map to a single instruction in the executable? Or is that a toolchain convention?

For example, why can't cbo.zero 32(x3) map to

addi x3,x3,32
cbo.zero (x3)
addi x3,x3,-32

?

cmuellner commented 2 years ago

We have both, instruction mnemonics and pseudoinstructions that might expand to multiple instructions (e.g. call). However, pseudo instructions are optional (they simplify programming).

So yes, we need to restrict the mnemonic to express a valid encoding of a single instruction.

The pseudoinstructions are defined in the RISC-V ASM manual (also part of the RISC-V unpriv spec): https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#-a-listing-of-standard-risc-v-pseudoinstructions.

dkruckemyer-ventana commented 2 years ago

Any objections to the following?

A cbo.zero instruction performs stores of zeros to the full set of bytes corresponding to the cache block whose effective address is the base address specified in rs1. The offset operand may be omitted; otherwise, any expression that computes the offset shall evaluate to zero. An implementation may or may not update the entire set of bytes atomically.

AndyGlew commented 2 years ago

Q: is the silent store optimization allowed?

(i.e. CBO.ZERO Mike do nothing if it knows that the "current" copy of the cache line is already zero.)

this might already have been discussed while I wasn't paying attention

moot if RISC-V Permits the silent store optimization for any store,

Sent from eM Client | www.emclient.com https://www.emclient.com/get

------ Original Message ------ From "David Kruckemyer" @.> To "riscv/riscv-CMOs" @.> Cc "Andy Glew" @.>; "Comment" @.> Date 5/9/2022 15:51:44 Subject Re: [riscv/riscv-CMOs] Textual format for cbo.* asm operands - '(rs1)' and '0(rs1)' vs 'rs1' (Issue #47)

Any objections to the following?

A cbo.zero instruction performs stores of zeros to the full set of bytes corresponding to the cache block whose effective address is the base address specified in rs1. The offset operand may be omitted; otherwise, any expression that computes the offset shall evaluate to zero. An implementation may or may not update the entire set of bytes atomically.

— Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-CMOs/issues/47#issuecomment-1121659982, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOGLJM6AJHEDAU5PHKBGI3VJGJIBANCNFSM5MBCOMXQ. You are receiving this because you commented.Message ID: @.***>

a4lg commented 2 years ago

@dkruckemyer-ventana I think your proposal looks good. I will submit version 2 of #49 (by rebasing) later.

dkruckemyer-ventana commented 2 years ago

I have the updates in a local repo, so no need to make any changes as long as you agree with the wording.

dkruckemyer-ventana commented 2 years ago

Please see #51 for the latest. I'll merge and generate a pdf if this looks OK.

dkruckemyer-ventana commented 2 years ago

Merged and created cmobase-v1.0.1.pdf

riscv / riscv-CMOs

Textual format for cbo.* asm operands - '(rs1)' and '0(rs1)' vs 'rs1' #47

49 is created to reflect textual form proposed by @asb (to be merged ONLY IF a resolution is made on this issue).

Proposal