riscv-non-isa / riscv-elf-psabi-doc

A RISC-V ELF psABI Document
https://jira.riscv.org/browse/RVG-4
Creative Commons Attribution 4.0 International
693 stars 163 forks source link

AMOCAS recommended compatibility mappings are too weak for the c11 memory model #444

Closed patrick-rivos closed 2 months ago

patrick-rivos commented 3 months ago

Andrea Parri spotted an issue with the amocas operation with the current seq_cst amo<op> mappings.

The issue stems from this sentence in the ratified zacas extension:

The memory operation performed by an AMOCAS.W/D/Q, when not successful, has acquire semantics if aq bit is 1 but does not have release semantics, regardless of rl.

https://github.com/riscvarchive/riscv-zacas/releases/tag/v1.0

The issue can be demonstrated using this litmus test:

C c-cmpxchg-fail
{ [z] = 1; }

P0(atomic_int *x, atomic_int *y, int *z)
{
        atomic_store_explicit(x, 1, memory_order_seq_cst);
        int r0 = atomic_compare_exchange_strong_explicit(y, z, 2, memory_order_seq_cst, memory_order_seq_cst);
}

P1(atomic_int *x, atomic_int *y)
{
        atomic_store_explicit(y, 1, memory_order_seq_cst);
        int r1 = atomic_load_explicit(x, memory_order_seq_cst);
}

exists (0:r0=0 /\ 1:r1=0)

The atomic_compare_exchange_strong_explicit fails (r0 == 0) meaning it read that y != z (value of 1) aka y = 0. r1 == 0 ensures that P1 also reads 0 aka x = 0.

herd7 results:

Test c-cmpxchg-fail Allowed
States 3
0:r0=0; 1:r1=1;
0:r0=1; 1:r1=0;
0:r0=1; 1:r1=1;
No

As shown by herd7, the (0:r0=0; 1:r1=0;) behavior is impossible with the c11 memory model.

This is a variant of another common memory-model testcase This testcase is essentially a variant of: ``` seq_cst store a, 1 | seq_cst store b, 1 seq_cst load 0x7, b | seq_cst load 0x8, a exists (0x7 = 0 /\ 0x8 = 0) ```

We can approximate this mapping into RISC-V asm using the memory model mappings. Since the mappings are compatible with A.6, we'll use an A.6 seq_cst store. Following that will be amocas.aqrl, however since it fails it follows the text in the spec:

The memory operation performed by an AMOCAS.W/D/Q, when not successful, has acquire semantics if aq bit is 1 but does not have release semantics, regardless of rl.

We'll model it using lw.aq - a hypothetical insn that is present in herd7.

RISCV rv-cmpxchg-fail

{
0:x1=x; 0:x2=1; 0:x3=y;
1:x1=y; 1:x2=1; 1:x3=x;
}

  P0             |  P1           ;
  fence rw,w     |  sw x2,0(x1)  ;
  sw x2,0(x1)    |  fence rw,rw  ;
  lw.aq x4,0(x3) |  lw x4,0(x3)  ;

exists (0:x4=0 /\ 1:x4=0)

P1's fences are minimized since it's acting as an observer thread. We know the lw/sw can't be reordered with a full fence between them so the interesting behavior is in the P0 hart.

herd7 results:

Test rv-cmpxchg-fail Allowed
States 4
0:x4=0; 1:x4=0;
0:x4=0; 1:x4=1;
0:x4=1; 1:x4=0;
0:x4=1; 1:x4=1;
Ok

This shows the issue: 0:x4=0; 1:x4=0;. This is the same case that causes us to require lr.aqrl for atomic_(memory_order_seq_cst).

Thanks again to Andrea Parri for spotting this and making the litmus tests.

To resolve this for atomic_compare_exchange ops that have the failure argument set to memory_order_seq_cst we may need a full fence before those amocas insns in order to maintain the ordering guarantees in case the cas fails.

Thankfully LLVM 18 does not contain zacas but trunk uses this mapping and is not marked as experimental. GCC does not have zacas support yet.

cc'ing a few atomics related people, please add anyone I missed: @hboehm @kito-cheng @aparri @daniellustig @preames @ilovepi @asb

daniellustig commented 2 months ago

Yes good catch. I think the key here is the failure argument, i.e., the last argument of atomic_compare_exchange_strong_explicit. It's memory_order_seq_cst in the C++ example, but not sequentially consistent in the AMOCAS instruction mapping due to the highlighted sentence saying .rl is ignored on failure. Is that right?

The zacas fast track extension was added recently, after the RVWMO baseline was written, and after the ISA manual appendix was written, and I think after our previous discussions about atomics ABI and C++ mappings. So this litmus test seems to have slipped through.

To resolve this we may need a full fence before each amocas insn in order to maintain the ordering guarantees in case the cas fails.

Specifically when the failure argument is memory_order_seq_cst?

Alternatively (and I know this won't be popular...), one could argue to treat this as a defect in the zacas spec and update it to say rl is respected even in case of failure...

patrick-rivos commented 2 months ago

I think the key here is the failure argument, i.e., the last argument of atomic_compare_exchange_strong_explicit. It's memory_order_seq_cst in the C++ example, but not sequentially consistent in the AMOCAS instruction mapping due to the highlighted sentence saying .rl is ignored on failure. Is that right?

Yes that's my understanding as well.

Specifically when the failure argument is memory_order_seq_cst?

Yes, thanks. I've edited the original comment to clarify that statement.

patrick-rivos commented 2 months ago

Here's a pointer to the original discussion about a failed AMOCAS' semantics: https://github.com/riscvarchive/riscv-zacas/issues/20

In C++ a compare_exchange that fails is treated as a load for memory model issues. "If the operation returns true, these operations are atomic read-modify-write operations (6.9.2) on the memory pointed to by this. Otherwise, these operations are atomic load operations on that memory." C++ load operations don't provide release ordering (which wouldn't make sense in that model). Thus there is no requirement that ordinary prior data operations be ordered with respect to even a seq_cst compare_exchange that fails.

I think this particular case is different since it's interacting with a seq_cst store. Hopefully @hboehm can provide more insight here.

asb commented 2 months ago

Thankfully LLVM 18 does not contain zacas but trunk uses this mapping and is not marked as experimental. GCC does not have zacas support yet.

Thanks for tagging me. In terms of timelines, the current plan sees LLVM 19 being branched on 23rd July, with a series of release candidates taking place until 3rd September. Getting a change in sooner is better than later, but of course getting the right fix in is more important. As any such change would be quite limited in scope to the RISC-V backend, we have some flexibility in landing it after the initial branch and backporting it to LLVM 19 if necessary. With that in mind, I'd say there's maybe 1 month to decide what to do here for the LLVM 19 release (which as a backup option, could be to mark Zacas as experimental again because the proper mappings haven't been definitively decided).

hboehm commented 2 months ago

As far as I can tell, this is only a problem if you use the original A.6 mapping and support AMOCAS. That hurts. Please don't do it. Follow the psABI.

It may be an LLVM bug (if that still uses A.6; please stop), but it is not a spec bug.

According to https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-atomic.adoc. a sequentially consistent store is mapped to either a sequence ending in fence rw,rw; or amoswap.rl. That doesn't give rise to the herd mapping.

The A.6 mapping doesn't work because it requires release semantics for the load part of an AMO operation, which AMOCAS doesn't provide. Effectively this generalization of A.6 is analogous to using the last mapping in the non-TSO table of https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-atomic.adoc, which says "Incompatible with the original "Table A.6" mapping."

(Yes, I think this could be fixed in A.6 by adding the fence. But why?)

Hans

On Wed, Jul 10, 2024 at 10:48 AM Alex Bradbury @.***> wrote:

Thankfully LLVM 18 does not contain zacas but trunk https://godbolt.org/z/K4hr7WTq7 uses this mapping and is not marked as experimental. GCC does not have zacas support yet.

Thanks for tagging me. In terms of timelines, the current plan sees LLVM 19 being branched on 23rd July, with a series of release candidates taking place until 3rd September. Getting a change in sooner is better than later, but of course getting the right fix in is more important. As any such change would be quite limited in scope to the RISC-V backend, we have some flexibility in landing it after the initial branch and backporting it to LLVM 19 if necessary. With that in mind, I'd say there's maybe 1 month to decide what to do here for the LLVM 19 release (which as a backup option, could be to mark Zacas as experimental again because the proper mappings haven't been definitively decided).

— Reply to this email directly, view it on GitHub https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/444#issuecomment-2221110293, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHUHKRBQN4JDRVY4VX6C2LZLVXXNAVCNFSM6AAAAABKT3CPYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRRGEYTAMRZGM . You are receiving this because you were mentioned.Message ID: @.***>

patrick-rivos commented 2 months ago

The issue is that the current PSABI mappings claim to be compatible with A.6 (or at least call out when the mappings aren't compatible). A note-3 disclaimer does not exist for amocas if someone reads the amo\<op> suggested mappings. The original intention of A6S was to be compatible with both A6C and A7 and this behavior/mapping breaks that.

One solution would be to add a fenced amocas mapping with an alternative non-fenced version with a note-3 warning so that the meaning of A6S is preserved.

hboehm commented 2 months ago

Agreed.

In my view, A.6 and AMOCAS were never meant to coexist. But yes, the psABI should call that out. For formatting reasons, it may make sense to just add a special footnote for that one line in the RVWMO table.

On second thought, I think the TSO table may require a more substantial change.

Hans

On Thu, Jul 11, 2024 at 2:30 PM patrick-rivos @.***> wrote:

The issue is that the current PSABI mappings claim to be compatible with A.6 (or at least call out when the mappings aren't compatible). A note-3 https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-atomic.adoc?plain=1#L145-L147 disclaimer does not exist for amocas if someone reads the amo suggested mappings. The original intention of A6S https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/6b692f80aba58aab37ff7bc356e00bcef4781063/riscv-elf.adoc?plain=1#L1291-L1329 was to be compatible with both A6C and A7 and this behavior/mapping breaks that.

One solution would be to add a fenced amocas mapping with an alternative non-fenced version with a note-3 warning so that the meaning of A6S is preserved.

— Reply to this email directly, view it on GitHub https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/444#issuecomment-2223970932, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHUHKXJB4RL5NCXWBMUAJLZL32N5AVCNFSM6AAAAABKT3CPYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRTHE3TAOJTGI . You are receiving this because you were mentioned.Message ID: @.***>

patrick-rivos commented 2 months ago

I've made the first draft update here: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/445

I'm not following what you mean by the TSO table needing a change. I think this line in the Ztso spec means we're OK within the PSABI mappings (as long as it applies to amocas):

All AMOs behave as if they have both acquire-RCsc and release-RCsc annotations.

ved-rivos commented 2 months ago

Since a atomic_compare_exchange performs a read-modify-write only when it succeeds and is only a load when it fails, should definition for a seq_cst atomic_compare_exchange that fails only needs acquire operation? Does C++ require release operation on the load performed by the failing atomic_compare_exchange?

A load operation with this memory order performs an acquire operation, a store performs a release operation, and read-modify-write performs both an acquire operation and a release operation.

anparri commented 2 months ago

Thanks for the write-up and for sending these changes, @patrick-rivos .

@ved-rivos , a seq_cst load is (strictly) stronger than an acquire load as the example above can also illustrate; of course, there is no such thing as a "release load" in the C memory model, but that should provide some relevant intuition indeed.

hboehm commented 2 months ago

Thanks! It seems a little weird to refer to footnote 3 at the bottom of the Ztso table, and not reproduce it there. Aside from that, the change looks good to me. I don't see any other TSO issues; I just hadn't stared at it enough.

On Tue, Jul 16, 2024 at 12:59 PM anparri @.***> wrote:

Thanks for the write-up and for sending these changes, @patrick-rivos https://github.com/patrick-rivos .

@ved-rivos https://github.com/ved-rivos , a seq_cst load is (strictly) stronger than an acquire load as the example above can also illustrate; of course, there is no such thing as a "release load" in the C memory model, but that should provide some relevant intuition indeed.

— Reply to this email directly, view it on GitHub https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/444#issuecomment-2231727292, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHUHKWYGJW35LM4KUE6HEDZMV3QNAVCNFSM6AAAAABKT3CPYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZRG4ZDOMRZGI . You are receiving this because you were mentioned.Message ID: @.***>

patrick-rivos commented 2 months ago

I think the discussion has concluded so I've marked #445 as ready to land. Please correct me if I'm wrong.

hboehm commented 2 months ago

Works for me.

On Thu, Jul 18, 2024 at 8:59 AM patrick-rivos @.***> wrote:

I think the discussion has concluded so I've marked #445 https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/445 as ready to land. Please correct me if I'm wrong.

— Reply to this email directly, view it on GitHub https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/444#issuecomment-2236928063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHUHKSWHCQSNC4A54WLRX3ZM7Q5VAVCNFSM6AAAAABKT3CPYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWHEZDQMBWGM . You are receiving this because you were mentioned.Message ID: <riscv-non-isa/riscv-elf-psabi-doc/issues/444/2236928063 @.***>