llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.25k stars 11.66k forks source link

_mm_load_si128() not generating 128-bit atomic stores. #46770

Open anmparal opened 4 years ago

anmparal commented 4 years ago
Bugzilla Link 47426
Version trunk
OS All
CC @anmparal,@topperc,@RKSimon,@rotateright

Extended Description

Given the test:

1 #include 2 #include 3 #include 4 5 uint32_t read_128b(m128i *ptr) 6 { 7 m128i val = _mm_load_si128(ptr); 8 return ((uint32_t ) &val)[0]| 9 ((uint32_t ) &val)[1]| 10 ((uint32_t ) &val)[2]| 11 ((uint32_t ) &val)[3]; 12 }

With clang version 12.0.0 (https://github.com/llvm/llvm-project.git 4eef14f9780d9fc9a88096a3cabd669bcfa02bbc 09/04/2020) the _mm_load_si128() is translated at '-O2 -msse2' to:

    movq    (%rdi), %rcx
    movq    8(%rdi), %rdx

This is not in accordance with Ref. [0], which specifies:

Synopsis __m128i _mm_load_si128 (__m128i const* mem_addr)

include

Instruction: movdqa xmm, m128 CPUID Flags: SSE2

(Note: gcc-10.1.0 and icc.16.0.5.027b both generate a movdqa as expected).

The accesses at lines 8 thro' 11 cause the problematic 64-bit loads; modifying the code (see marker: '<<<') so that:

1 #include 2 #include 3 #include 4 5 uint32_t read_128b(__m128i ptr, uint8_t index) <<< 6 { 7 __m128i val = _mm_load_si128(ptr); 8 return ((uint32_t ) &val)[index]; <<< 9 }

topperc commented 4 years ago

The min_vector_width attribute is currently only used to indicate that the compiler should honor 512-bit intrinsics. There's a drop in CPU frequency on some CPUs when using those instructions so the default behavior is for the auto vectorizers to avoid them on those CPUs. The attribute disables this behavior.

Can you clarify why these loads being split is problematic beyond not matching the documentation? You used the word "atomic" in the bug title, but neither Intel nor AMD guaranteed atomic memory access for anything larger than 8 bytes except for cmpxchg16b.

anmparal commented 4 years ago

assigned to @anmparal