openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.29k stars 2.1k forks source link

Skein formats fail with recent GCC on sparc64 #4468

Closed solardiz closed 3 years ago

solardiz commented 3 years ago
Benchmarking: skein-256, Skein 256 [Skein 32/64]... (64xOMP) FAILED (cmp_all(4))
Benchmarking: skein-512, Skein 512 [Skein 32/64]... (64xOMP) FAILED (cmp_all(1))
Benchmarking: dynamic_330 [skein224($p) 32/64 sph_skein]... FAILED (cmp_all(1))
Benchmarking: dynamic_340 [skein256($p) 32/64 sph_skein]... FAILED (cmp_all(1))
Benchmarking: dynamic_350 [skein384($p) 32/64 sph_skein]... FAILED (cmp_all(1))
Benchmarking: dynamic_360 [skein512($p) 32/64 sph_skein]... FAILED (cmp_all(1))

Also happens when testing the formats individually, with OMP_NUM_THREADS=1, and with --test=0. I didn't try a non-OpenMP build yet.

This is on:

Linux 5.9.0-2-sparc64-smp sparc64
gcc (Debian 10.2.0-17) 10.2.0

I wonder how this worked in my testing shortly before the 1.9.0-jumbo-1 release. We don't appear to have made any relevant change. Maybe the gcc version change exposed a bug.

solardiz commented 3 years ago

The problem is also seen with:

gcc version 9.3.0 (Debian 9.3.0-18)

but goes away with:

gcc version 6.5.0 20181026 (Debian 6.5.0-2) 
solardiz commented 3 years ago

This appears to fix it:

+++ b/src/sph_types.h
@@ -1886,7 +1886,7 @@ sph_dec64le(const void *src)
                sph_u64 tmp;

                __asm__ __volatile__ (
-                       "ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src));
+                       "ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src), "m" (*(const sph_u64 *)src));
                return tmp;
 /*
  * Not worth it generally.
@@ -1947,7 +1947,7 @@ sph_dec64le_aligned(const void *src)
 #if SPH_SPARCV9_GCC_64 && !SPH_NO_ASM
        sph_u64 tmp;

-       __asm__ __volatile__ ("ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src));
+       __asm__ __volatile__ ("ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src), "m" (*(const sph_u64 *)src));
        return tmp;
 /*
  * Not worth it generally.

When we're passing a register into inline asm, we're not telling gcc that it's a pointer and that memory pointed to by it must already be written to by that time, so gcc is free to postpone such write to after the inline asm if it feels that's somehow more optimal.

There are relevant examples in https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html under "6.47.2.6 Clobbers and Scratch Registers".

We should check latest upstream and report this bug to there if still present.

Edit: Can't easily find a current upstream, tweeted asking about it. Meanwhile, I notice there are many more similarly problematic inline asm blocks (for various architectures) in that source file. It'd be tricky for me to test fixes for all of them.

solardiz commented 3 years ago

The functions in sph_types.h also break C aliasing rules here and in many other places, but they "have to" for performance and this is hopefully still hidden from the compiler by translation unit boundaries (to break with LTO).

solardiz commented 3 years ago

Reported upstream, the immediate issue fixed, all tests pass with GCC 10.2.0 on that box now. I'll close this issue now, even though there are still other problematic-looking pieces of inline asm in that file, possibly affecting some (future) builds on some architectures.