pornin / sphlib

Copy of sphlib (cryptographic hash function implementations in C and Java)
MIT License
5 stars 3 forks source link

GCC inline asm in sph_types.h missing required constraints #1

Open solardiz opened 3 years ago

solardiz commented 3 years ago

When we're passing a variable into inline asm, we're not telling gcc that it's a pointer and that memory pointed to by it must already be written to by that time, so gcc is free to postpone such write until after the inline asm if it feels that's somehow more optimal.

There are relevant examples in https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html under "6.47.2.6 Clobbers and Scratch Registers".

Here's a specific issue I just found and fixed in the copy in JtR jumbo (https://github.com/openwall/john/issues/4468), but there are many more (that I can't test fixes of as easily):

+++ b/src/sph_types.h
@@ -1886,7 +1886,7 @@ sph_dec64le(const void *src)
                sph_u64 tmp;

                __asm__ __volatile__ (
-                       "ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src));
+                       "ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src), "m" (*(const sph_u64 *)src));
                return tmp;
 /*
  * Not worth it generally.
@@ -1947,7 +1947,7 @@ sph_dec64le_aligned(const void *src)
 #if SPH_SPARCV9_GCC_64 && !SPH_NO_ASM
        sph_u64 tmp;

-       __asm__ __volatile__ ("ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src));
+       __asm__ __volatile__ ("ldxa [%1]0x88,%0" : "=r" (tmp) : "r" (src), "m" (*(const sph_u64 *)src));
        return tmp;
 /*
  * Not worth it generally.
solardiz commented 3 years ago

This is probably the same issue seen in sphlib's own tests from this repo (same sparc64 system):

$ make -f Makefile.unix
[...]
./test_types
Used configuration:
  64-bit support:                        yes (native)
  pointer as integer type:               sph_u64
  unaligned access:                      no
  direct access, little endian:          no
  direct access, big endian:             yes
  little-endian access is efficient:     yes
  big-endian access is efficient:        yes
  specific architecture:                 Sparc v9 64-bit with GCC
sizeof int:          4
sizeof long:         8
sizeof long long:    8
sizeof pointer:      8
sizeof sph_u32:      4
sizeof sph_u64:      8
====== test types passed
./test_blake
====== test BLAKE passed
[...]
./test_simd
====== test SIMD passed
./test_skein
TEST FAILED [Skein]: assertion failed (test_skein.c:38): utest_byteequal(res, ref, 28)
make: *** [Makefile.unix:183: run-tests] Error 1
solardiz commented 3 years ago

Indeed, applying the patch above makes it work. I'll send a PR.

solardiz commented 3 years ago

BTW, I think __volatile__ is unneeded, and omitting it might enable helpful optimizations. Maybe it was used in lieu of the memory dependency, but it doesn't achieve what's needed (this is similar to the reordering example for floating-point rounding given in the GCC documentation: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile).