Closed solardiz closed 3 years ago
The problem is also seen with:
gcc version 9.3.0 (Debian 9.3.0-18)
but goes away with:
gcc version 6.5.0 20181026 (Debian 6.5.0-2)
This appears to fix it:
+++ b/src/sph_types.h
@@ -1886,7 +1886,7 @@ sph_dec64le(const void *src)
sph_u64 tmp;
__asm__ __volatile__ (
- "ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src));
+ "ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src), "m" (*(const sph_u64 *)src));
return tmp;
/*
* Not worth it generally.
@@ -1947,7 +1947,7 @@ sph_dec64le_aligned(const void *src)
#if SPH_SPARCV9_GCC_64 && !SPH_NO_ASM
sph_u64 tmp;
- __asm__ __volatile__ ("ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src));
+ __asm__ __volatile__ ("ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src), "m" (*(const sph_u64 *)src));
return tmp;
/*
* Not worth it generally.
When we're passing a register into inline asm, we're not telling gcc that it's a pointer and that memory pointed to by it must already be written to by that time, so gcc is free to postpone such write to after the inline asm if it feels that's somehow more optimal.
There are relevant examples in https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html under "6.47.2.6 Clobbers and Scratch Registers".
We should check latest upstream and report this bug to there if still present.
Edit: Can't easily find a current upstream, tweeted asking about it. Meanwhile, I notice there are many more similarly problematic inline asm blocks (for various architectures) in that source file. It'd be tricky for me to test fixes for all of them.
The functions in sph_types.h
also break C aliasing rules here and in many other places, but they "have to" for performance and this is hopefully still hidden from the compiler by translation unit boundaries (to break with LTO).
Reported upstream, the immediate issue fixed, all tests pass with GCC 10.2.0 on that box now. I'll close this issue now, even though there are still other problematic-looking pieces of inline asm in that file, possibly affecting some (future) builds on some architectures.
Also happens when testing the formats individually, with
OMP_NUM_THREADS=1
, and with--test=0
. I didn't try a non-OpenMP build yet.This is on:
I wonder how this worked in my testing shortly before the 1.9.0-jumbo-1 release. We don't appear to have made any relevant change. Maybe the gcc version change exposed a bug.