openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.26k stars 2.1k forks source link

Pseudo intrinsics depending on compiler optimization #2156

Closed magnumripper closed 8 years ago

magnumripper commented 8 years ago

See #2146.

#if ARCH_BITS == 32
(...)
#undef _mm_insert_epi64
#define _mm_insert_epi64 my__mm_insert_epi64

static inline __m128i _mm_insert_epi64(__m128i a, uint64_t b, int c) {
    c <<= 1;
    a = _mm_insert_epi32(a, (unsigned int)b, c);
    return _mm_insert_epi32(a, (unsigned int)(b >> 32), c + 1);
}
#endif

Without compiler optimizations, the above does not compile.

gost3411-2012-sse41_plug.c: In function ‘my__mm_insert_epi64’:
gost3411-2012-sse41_plug.c:25:6: error: selector must be an integer constant in the range 0..3
  a = _mm_insert_epi32(a, (unsigned int)b, c);
      ^
gost3411-2012-sse41_plug.c:26:9: error: selector must be an integer constant in the range 0..3
  return _mm_insert_epi32(a, (unsigned int)(b >> 32), c + 1);
         ^

Reason is the third parameter of _mm_insert_epi32 needs to be a constant so this code really relies on optimizations from the compiler. I suppose we could get rid of this problem by using a function-like macro.

magnumripper commented 8 years ago
$ git grep -- _mm_insert_epi64
gost3411-2012-sse41_plug.c:#undef _mm_insert_epi64
gost3411-2012-sse41_plug.c:#define _mm_insert_epi64 my__mm_insert_epi64
gost3411-2012-sse41_plug.c:static inline __m128i _mm_insert_epi64(__m128i a, uint64_t b, int c) {
gost3411-2012-sse41_plug.c:     return _mm_insert_epi64(_mm_cvtsi64_si128(r0), r1, 1);
gost3411-2012-sse41_plug.c:     return _mm_insert_epi64(_mm_cvtsi64_si128(r0), r1, 1);
gost3411-2012-sse41_plug.c:     return _mm_insert_epi64(_mm_cvtsi64_si128(r0), r1, 1);
gost3411-2012-sse41_plug.c:     return _mm_insert_epi64(_mm_cvtsi64_si128(r0), r1, 1);

We could probably do something even simpler. Argument is always 1.

jfoug commented 8 years ago

This works fine (you want me to make a comment, and check it in?

static inline __m128i _mm_insert_epi64(__m128i a, uint64_t b, int c) {
        a = _mm_insert_epi32(a, (unsigned int)b, 2);
        return _mm_insert_epi32(a, (unsigned int)(b >> 32), 3);
}
jfoug commented 8 years ago

We can also change inline into #define, and the problem goes away also ;)

magnumripper commented 8 years ago

We can also change inline into #define, and the problem goes away also

Are you sure it goes away using -O0?

jfoug commented 8 years ago

Nope, I can not get a #define version to work well. But the one using 2 constants is fine.

jfoug commented 8 years ago

No, I just removed -O2 from CLFAGSX, BUT prior to change it fails to compile (as you pointed out).

jfoug commented 8 years ago

But with 2 and 3 (constants), it builds just fine.

jfoug commented 8 years ago
make find_version
make[1]: Entering directory '/cygdrive/c/projects/phpbb/johnripper/john-1.7.9/bleed32/src'
echo "#define JTR_GIT_VERSION \"1.8.0.6-jumbo-1-2333-g0220646\"" > version.h.new
diff >/dev/null 2>/dev/null version.h.new version.h && rm -f version.h.new || mv -f version.h.new version.h
make[1]: Leaving directory '/cygdrive/c/projects/phpbb/johnripper/john-1.7.9/bleed32/src'
make[1]: Entering directory '/cygdrive/c/projects/phpbb/johnripper/john-1.7.9/bleed32/src'
gcc -DAC_BUILT -march=native -mavx -DJOHN_AVX -c -g -I/usr/local/include -I/opt/AMDAPP/include -DARCH_LITTLE_ENDIAN=1   -Wall -Wdeclaration-after-statement -fomit-frame-pointer --param allow-store-data-races=0 -Wno-deprecated-declarations  -Wunused-but-set-variable -std=gnu89 -Wdate-time -D_GNU_SOURCE    -fopenmp   -DHAVE_HT  -I/usr/local/include -I/opt/AMDAPP/include -DHAVE_OPENCL  -funroll-loops gost3411-2012-sse41_plug.c -o gost3411-2012-sse41_plug.o
In file included from gost3411-2012-sse41_plug.c:5:0:
gost3411-2012-sse41_plug.c: In function ‘my__mm_insert_epi64’:
gost3411-2012-sse41_plug.c:25:6: error: selector must be an integer constant in the range 0..3
  a = _mm_insert_epi32(a, (unsigned int)b, c);
      ^
gost3411-2012-sse41_plug.c:26:9: error: selector must be an integer constant in the range 0..3
  return _mm_insert_epi32(a, (unsigned int)(b >> 32), c + 1);
         ^
Makefile:1565: recipe for target 'gost3411-2012-sse41_plug.o' failed
make[1]: **\* [gost3411-2012-sse41_plug.o] Error 1
make[1]: Leaving directory '/cygdrive/c/projects/phpbb/johnripper/john-1.7.9/bleed32/src'
Makefile:187: recipe for target 'default' failed
make: **\* [default] Error 2

That is original.

With 2 and 3 I get this:


$ make
make find_version
make[1]: Entering directory '/cygdrive/c/projects/phpbb/johnripper/john-1.7.9/bleed32/src'
echo "#define JTR_GIT_VERSION \"1.8.0.6-jumbo-1-2333-g0220646+\"" > version.h.new
diff >/dev/null 2>/dev/null version.h.new version.h && rm -f version.h.new || mv -f version.h.new version.h
make[1]: Leaving directory '/cygdrive/c/projects/phpbb/johnripper/john-1.7.9/bleed32/src'
make[1]: Entering directory '/cygdrive/c/projects/phpbb/johnripper/john-1.7.9/bleed32/src'
gcc -DAC_BUILT -march=native -mavx -DJOHN_AVX -c -g -I/usr/local/include -I/opt/AMDAPP/include -DARCH_LITTLE_ENDIAN=1   -Wall -Wdeclaration-after-statement -fomit-frame-pointer --param allow-store-data-races=0 -Wno-deprecated-declarations  -Wunused-but-set-variable -std=gnu89 -Wdate-time -D_GNU_SOURCE    -fopenmp   -DHAVE_HT  -I/usr/local/include -I/opt/AMDAPP/include -DHAVE_OPENCL  -funroll-loops gost3411-2012-sse41_plug.c -o gost3411-2012-sse41_plug.o
echo "#define JTR_GIT_VERSION \"1.8.0.6-jumbo-1-2333-g0220646+\"" > version.h.new
diff >/dev/null 2>/dev/null version.h.new version.h && rm -f version.h.new || mv -f version.h.new version.h
gcc -DAC_BUILT -march=native -mavx -DJOHN_AVX -c -g -I/usr/local/include -I/opt/AMDAPP/include -DARCH_LITTLE_ENDIAN=1   -Wall -Wdeclaration-after-statement -fomit-frame-pointer --param allow-store-data-races=0 -Wno-deprecated-declarations  -Wunused-but-set-variable -std=gnu89 -Wdate-time -D_GNU_SOURCE    -fopenmp   -DHAVE_HT  -I/usr/local/include -I/opt/AMDAPP/include -DHAVE_OPENCL  -funroll-loops options.c -o options.o
gcc -DAC_BUILT -march=native -mavx -DJOHN_AVX -c -g -I/usr/local/include -I/opt/AMDAPP/include -DARCH_LITTLE_ENDIAN=1   -Wall -Wdeclaration-after-statement -fomit-frame-pointer --param allow-store-data-races=0 -Wno-deprecated-declarations  -Wunused-but-set-variable -std=gnu89 -Wdate-time -D_GNU_SOURCE    -fopenmp   -DHAVE_HT  -I/usr/local/include -I/opt/AMDAPP/include -DHAVE_OPENCL  -funroll-loops listconf.c -o listconf.o
gcc jumbo.o john-mpi.o DES_fmt.o DES_std.o DES_bs.o DES_bs_b.o BSDI_fmt.o MD5_fmt.o MD5_std.o cryptmd5_common.o BF_fmt.o BF_std.o BF_common.o scrypt_fmt.o escrypt/crypto_scrypt-best.o escrypt/crypto_scrypt-common.o escrypt/sha256.o AFS_fmt.o LM_fmt.o trip_fmt.o NT_fmt.o KeccakDuplex.o KeccakF-1600-opt64.o KeccakHash.o KeccakSponge.o whirlpool.o haval.o skein.o md2.o panama.o timer.o md5.o rc4.o hmacmd5.o base64.o base64_convert.o md4.o sha2.o dynamic_fmt.o dynamic_parser.o dynamic_preloads.o dynamic_utils.o dynamic_big_crypt.o dynamic_compiler.o dynamic_compiler_lib.o ripemd.o tiger.o ssh2john.o pfx2john.o unrarcmd.o unrarfilter.o unrarhlp.o unrar.o unrarppm.o unrarvm.o rar2john.o zip2john.o pkzip.o racf2john.o dmg2john.o keepass2john.o hccap2john.o 7z_fmt_plug.o AzureAD_common_plug.o AzureAD_fmt_plug.o BFEgg_fmt_plug.o DMD5_fmt_plug.o DOMINOSEC8_fmt_plug.o DOMINOSEC_fmt_plug.o EPI_fmt_plug.o FGT_fmt_plug.o HDAA_fmt_plug.o IPB2_fmt_plug.o KRB4_fmt_plug.o KRB4_std_plug.o KRB5_fmt_plug.o KRB5_std_plug.o MSCHAPv2_bs_fmt_plug.o NETLM_fmt_plug.o NETLMv2_fmt_plug.o NETNTLM_bs_fmt_plug.o NETNTLMv2_fmt_plug.o NETSPLITLM_fmt_plug.o NS_fmt_plug.o PHPS2_fmt_plug.o PHPS_fmt_plug.o PO_fmt_plug.o SKEY_fmt_plug.o SKEY_jtr_plug.o SybaseASE_fmt_plug.o SybasePROP_fmt_plug.o XSHA512_fmt_plug.o XSHA_fmt_plug.o aes_xts_plug.o agilekeychain_fmt_plug.o aix_smd5_fmt_plug.o aix_ssha_fmt_plug.o androidfde_fmt_plug.o asaMD5_fmt_plug.o asn1_plug.o axcrypt_fmt_plug.o bcrypt_pbkdf_plug.o bitcoin_fmt_plug.o blackberry_ES10_fmt_plug.o blake2b-ref_plug.o blake2b_plug.o blf_plug.o blockchain_fmt_plug.o chap_fmt_plug.o citrix_ns_fmt_plug.o clipperz_srp_fmt_plug.o cloudkeychain_fmt_plug.o cq_fmt_plug.o crc32_fmt_plug.o crypt-sha1_fmt_plug.o cryptsha256_fmt_plug.o cryptsha512_fmt_plug.o cuda_cryptmd5_fmt_plug.o cuda_cryptsha256_fmt_plug.o cu
.....
jfoug commented 8 years ago

HOWEVER, I am not sure that that function is even being used (at least in the -test=0 mode). I put a deref of a null pointer within that function, and -test=0 -form=gost does not crash :???

So I am not 100% happy with the change. I am on AVX machine, so I would image that it is SSE4.1 compliant

jfoug commented 8 years ago

Nope, once I tracked shit down:

$ ../run/john -test=0 -form=stribog-256
Will run 8 OpenMP threads
Testing: Stribog-256 [GOST R 34.11-2012 128/128 SSE4.1 1x]... (8xOMP) Segmentation fault (core dumped)

$ ../run/john -test=0 -form=stribog-512
Will run 8 OpenMP threads
Testing: Stribog-512 [GOST R 34.11-2012 128/128 SSE4.1 1x]... (8xOMP) Segmentation fault (core dumped)

That was what I was expecting to see (so I knew my purposeful null deref worked).

Now with the 2 / 3 constants, I see this:

$ ../run/john -test=0 -form=stribog-512
Will run 8 OpenMP threads
Testing: Stribog-512 [GOST R 34.11-2012 128/128 SSE4.1 1x]... (8xOMP) PASS

$ ../run/john -test=0 -form=stribog-256
Will run 8 OpenMP threads
Testing: Stribog-256 [GOST R 34.11-2012 128/128 SSE4.1 1x]... (8xOMP) PASS
$ ../run/john
John the Ripper 1.8.0.6-jumbo-1-2333-g0220646+ OMP [cygwin 32-bit AVX-ac]
Copyright (c) 1996-2016 by Solar Designer and others
Homepage: http://www.openwall.com/john/

32 bit build.

jfoug commented 8 years ago

a0d3793

jfoug commented 8 years ago

I feel good about this, since I was able to validate that there are NO build problems, AND since that function is now known to be used, once I got the crash when I put in *((unsigned*)!c)=0; so that I was dereffing a null pointer on purpose.

Ugly code with constants, but lets stick a fork in it. We already have constants of 1 in all calls to the 64 bit function.