`__atomic_always_lock_free` gets wrong result for subword values

riscvarchive / riscv-gcc

GNU General Public License v2.0

359 stars 275 forks source link

`__atomic_always_lock_free` gets wrong result for subword values #15

Open sorear opened 7 years ago

sorear commented 7 years ago

# cat ailf.c
#include <stdio.h>
int main() {
#define CHECK(size) \
        printf("size %d always? %d is? %d\n", size, \
            __atomic_always_lock_free(size,(char*)0), \
            __atomic_is_lock_free(size,(char*)0));
    CHECK(1) CHECK(2) CHECK(4) CHECK(8) CHECK(16) CHECK(32)
    return 0;
}
# cc -latomic ailf.c; ./a.out
size 1 always? 0 is? 1
size 2 always? 0 is? 1
size 4 always? 1 is? 1
size 8 always? 1 is? 1
size 16 always? 0 is? 0
size 32 always? 0 is? 0

Since the fallback code is lock-free, we should be getting 1 in the first two cases. This is probably related to not having patterns for size 1/size 2 atomics. (??? is this actually true)

(while factually incorrect, I can't find anything in either the gcc documentation or the C11 final draft which specifically forbids this)

jim-wilson commented 6 years ago

Atomic_is_lock_free returns true if we can directly emit an atomic sequence, otherwise it calls into libatomic to see if libatomic can emit an atomic sequence. But atomic_always_lock_free is intended to be used in cases where we require a compile-time constant, so it returns true if we can directly emit an atomic sequence, otherwise it returns false. It doesn't matter that libatomic might be able to emit the sequence. So the results are correct, given that we don't have subword atomics.

The only fix is to implement subword atomics. It does appear that every interesting target except RISC-V has subword atomics.

palmer-dabbelt commented 6 years ago

IIRC, the correct thing to do here is to implement the shorter atomic sequences as LR/SC sequences. I think something like this should do it, unless @aswaterman has a smarter idea?

char atomic_add_1(char *addr, char addend) {
    __asm__ volatile (
    1b:
      lr.w %[out], %[addr]
      add %[out], %[out], %[addend]
      sc.w %[ok], %[out], %[addr]
      bnez %[ok], 1b
      : [addr]"r"(addr / 4 * 4), [addend]"r"(addend << 8 * addr % 4)
      : [ok]"=r"(ok), [out]"=r"(out)
}

aswaterman commented 6 years ago

For the bitwise logical atomics, you can just use AMOx.W, with an address of addr & -4 and operand of (value & 0xFF) << (8 * (addr % 4)).

.For addition, you need to use LR/SC, but that code's not quite right because the addition can carry out into the next-highest byte within the word. So you need to do something like

mask = 0xFF << (8 (addr % 4)); out = ((out + (addend << (8 (addr % 4)))) & mask) | (out & ~mask)

Also don't forget to mask off the LSBs of the address before doing the LR/SC.

sorear commented 6 years ago

The necessary code already exists in libatomic (__atomic_fetch_add_2 et al do not use the mutex, they are simple lr/sc loops) but needs to be moved into libgcc or inlined by the compiler to match behavior of other targets.

wangzhankun commented 10 months ago

the issue has been solved since 13.2.0 version, test on https://godbolt.org/