617: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 61e: 00 00 ... $ objdump -d t1-clang ... 0000000000400480
40049f: 90 nop ...
Closed pogo59 closed 3 years ago
I'm not sure there's an alignment requirement for that array other than what is implied by float. Versions of gcc prior to 5 use align 16 with -Os and align 32 with -O2.
Adding attribute ((aligned (32))) to the struct definition also appears to fix the crash.
I was coming to the same conclusion. I threw in some alignof() expressions and they all report 4. Pumping up the alignment in the generated code seems rather optional.
I'll resolve this as invalid and we'll patch the test to enforce the required alignment.
I'm not sure there's an alignment requirement for that array other than what is implied by float. Versions of gcc prior to 5 use align 16 with -Os and align 32 with -O2.
Adding attribute ((aligned (32))) to the struct definition also appears to fix the crash.
I suspect that there is an alignment problem here. If clang is not aligning the array to 32-bytes, then it is a problem for the VMOVAPS.
Out of curiosity, I have built the reproducible posted by Paul on compiler explorer.
I noticed that the array is emitted by clang with assembler directive
.p2align 4
That directive only enforces a 16-byte alignment. Unfortunately the VMOVAPS from that code expects the operand to be aligned on a 32-byte boundary.
GCC, on the other hand, explicitly aligns the array to a 32-byte boundary using the following assembly directive:
.align 32
I mentioned this to Paul, and I have suggested as an experiment to replace the VMOVAPS with VMOVUPS. That was enough to fix his crash.
So, I guess that there may be a problem with the alignment set by clang for that array.
$ objdump -d t1-gcc
...
00000000000005fa
617: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
61e: 00 00
...
$ objdump -d t1-clang
...
0000000000400480
40049f: 90 nop
...
gcc sets up the address of data
as
605: 48 8d 05 14 0a 20 00 lea 0x200a14(%rip),%rax
while clang produces
400492: b8 30 10 60 00 mov $0x601030,%eax
Extended Description
Reduced from gdb test case gdb.arch/i386-avx.c
$ cat t1.c typedef struct { float f[8]; } v8sf_t; v8sf_t data[] = { { { 0.0, 0.125, 0.25, 0.375, 0.50, 0.625, 0.75, 0.875 } }, }; int main (int argc, char *argv) { asm ("vmovaps 0(%0), %%ymm0\n\t" : / no output operands */ : "r" (data) : "xmm0"); return 0; } $ gcc t1.c -o t1-gcc $ ./t1-gcc $ clang t1.c -o t1-clang $ ./t1-clang Segementation fault $
using gcc 7.5.0 and clang 13 1f169a77