solardiz commented 7 years ago

Trying to build today's bleeding-jumbo on GCC Compile Farm's "gcc110" configures as:

Target CPU ................................. powerpc64, 64-bit BE
AES-NI support ............................. no
Target OS .................................. linux-gnu
Cross compiling ............................ no
Legacy arch header ......................... ppc64.h

That was simple ./configure with no options. Apparently, it didn't detect AltiVec? (The hardware supports AltiVec.) Anyway, the build fails with many errors like:

In file included from rawSHA512_ng_fmt_plug.c:43:0:
pseudo_intrinsics.h:606:5: error: Builtin function __builtin_altivec_vsrd requires the -mpower8-vector option
  (x = vxor(vsrli_epi64(x, 32), vslli_epi64(x, 32)), vswap32_emu(x))
     ^
pseudo_intrinsics.h:145:33: note: in expansion of macro Б─≤vswap64_emuБ─≥
 #define vswap64                 vswap64_emu
                                 ^
rawSHA512_ng_fmt_plug.c:321:9: note: in expansion of macro Б─≤vswap64Б─≥
         vswap64(tmp1);
         ^

rawSHA512_ng_fmt_plug.c:179:10: error: Builtin function __builtin_altivec_vrld requires the -mpower8-vector option
     tmp2 = vadd_epi64(S0(a),Maj(a,b,c));                                  \
          ^
rawSHA512_ng_fmt_plug.c:376:9: note: in expansion of macro Б─≤SHA512_STEPБ─≥
         SHA512_STEP(b, c, d, e, f, g, h, a, 39, 0x92722c851482353bULL);
         ^

and many more like these, in other source files too, indicating that our source files try to use SIMD anyway.

I've tried these, to no avail:

./configure CFLAGS='-O2 -mpower8-vector'
./configure CFLAGS='-O2 -maltivec -mpower8-vector'
./configure CFLAGS='-O2 -maltivec'
./configure CFLAGS='-O2 -U__ALTIVEC__'

although the ways the build failed changed. In none of these cases would ./configure explicitly say it'd use AltiVec. I think we should introduce end-user friendly configure options to enable/disable SIMD (and have those options substitute the needed platform-specific compiler flags automatically).

BTW, there are also these warnings:

sha2.c: In function Б─≤jtr_sha256_finalБ─≥:
sha2.c:234:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
  OUTBE32(bits, m.mlen, 4);
  ^
sha2.c: In function Б─≤jtr_sha512_finalБ─≥:
sha2.c:557:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
  OUTBE64(bits, m.mlen, 8);
  ^

feal8_plug.c: In function Б─≤feal_DecryptБ─≥:
feal8_plug.c:56:25: warning: array subscript is above array bounds [-Warray-bounds]
  f1 = A.Byte[1] ^ B.Byte[2] ^ A.Byte[0];
                         ^
feal8_plug.c:57:25: warning: array subscript is above array bounds [-Warray-bounds]
  f2 = A.Byte[2] ^ B.Byte[3] ^ A.Byte[3];
                         ^
feal8_plug.c: In function Б─≤feal_EncryptБ─≥:
feal8_plug.c:56:25: warning: array subscript is above array bounds [-Warray-bounds]
  f1 = A.Byte[1] ^ B.Byte[2] ^ A.Byte[0];
                         ^
feal8_plug.c:57:25: warning: array subscript is above array bounds [-Warray-bounds]
  f2 = A.Byte[2] ^ B.Byte[3] ^ A.Byte[3];
                         ^

solardiz commented 7 years ago

@magnumripper and/or @jfoug, do you have GCC Compile Farm accounts? If not, you can apply for them. I'd prefer that you figure this out and fix it. ;-)

jfoug commented 7 years ago

I am setting up a QEMU PPC64 system (hopefully). I have also applied for a GCC compile farm account, but it listed the acceptance was by hand, and would take several days at least. I have not heard back on that yet. But hopefully the 8.8.0 Debian PPC64 VM will work, and if so, will provide a nice controlled environment to look into these type bugs.

Note, there is nothing in configure to detect SIMD other than x64 variants, and now a few others (like neon). NOTE, there will likely be a WHOLE lot of failure, once we go to BE SIMD code in many formats. Time will tell, for sure ;) Almost all of the work inside the SIMD defines, worked under assumption of LE byte ordering. Hopefully, this VM will emulate AltVec SIMD instructions also, and make development / testing work out much easier.

jfoug commented 7 years ago

Warnings fixed: 60b9362a4 and 14dd15e4c

jfoug commented 7 years ago

I have (finally) gotten an AltiVec build working (well it runs) on the ppc. It was not easy, so far.

As I expected, all (or most) formats fail, since there was assumptions made for the SIMD code that SIMD builds would be in LE format. I will see if I can easily figure this out. We should be mostly able to hide the endianity using the GETPOS macros

jfoug commented 7 years ago

here are the hacks I have done so far to get this far (talking to myself here)

Made changes to configure.ac and m4/jtr_ppc.m4 Changed the original PPC macro to be PPC64LE Added a new PPC64 macro. Added this output to the PPC64 macro: CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector -m64" Also made changes to configure.ac to use this PPC64 macro

CFLAGS="-O2 -m64" ./configure

Had to apt-get install all of the power64 stuff I could

Edited aes/Makefile and aes/openssl/Makefile (adding the -m64)

Made changes to the ./Makefile, LDFLAGS (added /usr/lib64 /usr/lib)

LD_LIBRARY_PATH needed updated (/lib64:/usr/lib) Note, when I build oSSL, I screwed up and did not properly set the output patch to be /lib64, since I was not aware of where it should be.

Removed encfs format (uses some oSSL that was not compatible with my build)

Edited listconf.c removing the oSSL versioning code that is not part of my library.

I may be leaving something out, but I think that is most of it.

Now, john runs, but all AltiVec formats fail (as I expected). But this provides a basis to figure out how to make them work. Once one is completed, then most should be easy. Then there will likely be some (10% or so), that require some other smallish porting changes.

solardiz commented 7 years ago

Thank you, Jim! I'd expect the DES-based formats from core to work fine with AltiVec, since they do in core. (In fact, historically JtR's bitslice DES first reached 128-bit SIMD with AltiVec and only later with SSE.) Is this not the case?

solardiz commented 7 years ago

Here's john-1.8.0.11 built with make -j linux-ppc32-altivec on gcc110:

[solar@gcc1-power7 src]$ GOMP_CPU_AFFINITY=0-63 ../run/john -te=1
Will run 64 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 128/128 AltiVec]... DONE
Many salts: 57671K c/s real, 921420 c/s virtual
Only one salt:  31405K c/s real, 801602 c/s virtual

Benchmarking: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 128/128 AltiVec]... DONE
Many salts: 1671K c/s real, 26087 c/s virtual
Only one salt:  1038K c/s real, 24066 c/s virtual

Benchmarking: md5crypt [MD5 32/32 X2]... DONE
Raw:    265264 c/s real, 4218 c/s virtual

Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/32]... DONE
Raw:    16728 c/s real, 266 c/s virtual

Benchmarking: LM [DES 128/128 AltiVec]... DONE
Raw:    66963K c/s real, 3673K c/s virtual

Benchmarking: AFS, Kerberos AFS [DES 24/32 4K]... DONE
Short:  354304 c/s real, 354304 c/s virtual
Long:   845560 c/s real, 854016 c/s virtual

Benchmarking: tripcode [DES 128/128 AltiVec]... DONE
Raw:    16797K c/s real, 710825 c/s virtual

Benchmarking: dummy [N/A]... DONE
Raw:    81517K c/s real, 81517K c/s virtual

Benchmarking: crypt, generic crypt(3) [?/32]... DONE
Many salts: 1263K c/s real, 20041 c/s virtual
Only one salt:  1224K c/s real, 19138 c/s virtual

solardiz commented 7 years ago

linux-ppc64-altivec build failed. Now fixed in core with:

--- john-1.8.0.11/src/DES_bs_b.c    2016-01-19 04:23:55.000000000 +0000
+++ john-1.8.0.11-ppc/src/DES_bs_b.c    2017-11-13 17:08:04.208703158 +0000
@@ -232,7 +232,7 @@ typedef struct {
 typedef vector signed int vtype;

 #define vst(dst, ofs, src) \
-   vec_st((src), (ofs) * sizeof(DES_bs_vector), (dst))
+   vec_st((src), (ofs) * sizeof(DES_bs_vector), (vtype *)(dst))

 #define vxorf(a, b) \
    vec_xor((a), (b))

Also added -fno-strict-aliasing to OPT_INLINE for both linux-ppc*-altivec targets.

Oh, and MD5_IMM turned out to be more optimal here (unexpected), for both 32-bit and 64-bit. With all of these changes, 64-bit produces:

[solar@gcc1-power7 src]$ GOMP_CPU_AFFINITY=0-63 ../run/john -te=1 
Will run 64 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 128/128 AltiVec]... DONE
Many salts: 59506K c/s real, 936818 c/s virtual
Only one salt:  32443K c/s real, 798440 c/s virtual

Benchmarking: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 128/128 AltiVec]... DONE
Many salts: 1605K c/s real, 25013 c/s virtual
Only one salt:  1103K c/s real, 23167 c/s virtual

Benchmarking: md5crypt [MD5 32/64 X2]... DONE
Raw:    280868 c/s real, 4431 c/s virtual

Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64]... DONE
Raw:    17108 c/s real, 267 c/s virtual

Benchmarking: LM [DES 128/128 AltiVec]... DONE
Raw:    67633K c/s real, 3763K c/s virtual

Benchmarking: AFS, Kerberos AFS [DES 48/64 4K]... DONE
Short:  345344 c/s real, 345344 c/s virtual
Long:   1125K c/s real, 1125K c/s virtual

Benchmarking: tripcode [DES 128/128 AltiVec]... DONE
Raw:    17301K c/s real, 708207 c/s virtual

Benchmarking: dummy [N/A]... DONE
Raw:    79200K c/s real, 79200K c/s virtual

Benchmarking: crypt, generic crypt(3) [?/64]... DONE
Many salts: 1226K c/s real, 19400 c/s virtual
Only one salt:  1188K c/s real, 18571 c/s virtual

Except for md5crypt (where MD5_IMM made a difference), other performance changes seen here are probably mostly a random fluctuation rather than a genuine difference between 32-bit and 64-bit builds.

jfoug commented 7 years ago

descrypt does not appear to use AltiVec instructions in the build I did. md5crypt does use AltiVec, and fails.

Note, I likely will not have time to dig into this until later in the week (possibly not until the weekend)

jfoug commented 7 years ago

I have looked into some of the jumbo formats, and Doing this for AltiVec (or any BE SIMD), will be pretty much a full rewrite of a LOT of code. Anything dealing with any interleaved buffers will require code changes. I was hoping that simply updating the GETPOS() macro would suffice. In some instances that may be the case, but that likely will not be the rule.

solardiz commented 7 years ago

This shouldn't be so difficult to fix. Lei Zhang got AltiVec working for us in 64-bit BE builds in 2015. We just seem to have broken this since, and we need to unbreak it.

http://www.openwall.com/lists/john-dev/2015/07/10/1 http://www.openwall.com/lists/john-dev/2015/07/10/2

jfoug commented 7 years ago

I have gotten AltiVec code working for rawMD5 format. I will post the diff, and will also create another bug list. That bug list will be to port all SIMD formats to work properly for any build with BE/LE SIMD BE/LE non-SIMD and in either OMP or Non. That task will NOT be a small undertaking. But now with the first one done, at least I have a template for replacing the set_key optimized function, and for proceeding forward with other formats. But it is just a slow processes. A lot of dump_stuff_mmx() calls get scattered all over the place, working to figure things out. The longest time here, was finding the actual RIGHT byte layout for the format hash to work at all. Also, I am not 100% sure about REVERSE_STEPS code. I made a change to cmp_exact to simply skip the first 32 bit value (since it was already tested in the cmp_one/cmp_all tests. Here it the change (most debugging removed, only a small commented out section left)

diff --git a/src/rawMD5_fmt_plug.c b/src/rawMD5_fmt_plug.c
index 8c689d7..284b44d 100644
--- a/src/rawMD5_fmt_plug.c
+++ b/src/rawMD5_fmt_plug.c
@@ -101,8 +101,12 @@ static struct fmt_tests tests[] = {
 #define PLAINTEXT_LENGTH       55
 #define MIN_KEYS_PER_CRYPT     NBKEYS
 #define MAX_KEYS_PER_CRYPT     NBKEYS
+#if ARCH_LITTLE_ENDIAN==1
 #define GETPOS(i, index)       ( (index&(SIMD_COEF_32-1))*4 + ((i)&(0xffffffff-3))*SIMD_COEF_32 + ((i)&3) + (unsigned int)index/SIMD_COEF_32*MD5_BUF_SIZ*4*SIMD_COEF_32 )
 #else
+#define GETPOS(i, index)       ( (index&(SIMD_COEF_32-1))*4 + ((i)&(0xffffffff-3))*SIMD_COEF_32 + (3-((i)&3)) + (unsigned int)index/SIMD_COEF_32*MD5_BUF_SIZ*4*SIMD_COEF_32 )
+#endif
+#else
 #define PLAINTEXT_LENGTH       125
 #define MIN_KEYS_PER_CRYPT     1
 #define MAX_KEYS_PER_CRYPT     1
@@ -223,11 +227,7 @@ static void *get_binary(char *ciphertext)
        temp |= ((unsigned int)(atoi16[ARCH_INDEX(ciphertext[i*8+7])]))<<24;

-#if ARCH_LITTLE_ENDIAN
        out[i]=temp;
-#else
-       out[i]=JOHNSWAP(temp);
-#endif
    }

 #if SIMD_COEF_32 && defined(REVERSE_STEPS)
@@ -250,10 +250,6 @@ static char *source(char *source, void *binary)
    md5_unreverse(b);
 #endif

-#if ARCH_LITTLE_ENDIAN==0
-   alter_endianity(b, 16);
-#endif
-
    p = &out[TAG_LENGTH];
    for (i = 0; i < 4; i++)
        for (j = 0; j < 8; j++)
@@ -278,6 +274,7 @@ static void set_key(char *_key, int index)
    uint32_t temp;

    len = 0;
+#if ARCH_LITTLE_ENDIAN==1
    while((temp = *key++) & 0xff) {
        if (!(temp & 0xff00))
        {
@@ -298,6 +295,28 @@ static void set_key(char *_key, int index)
            goto key_cleaning;
        }
        *keybuf_word = temp;
+#else
+   while((temp = *key++) & 0xff000000) {
+       if (!(temp & 0xff0000))
+       {
+           *keybuf_word = JOHNSWAP((temp & 0xff000000) | (0x80 << 16));
+           len++;
+           goto key_cleaning;
+       }
+       if (!(temp & 0xff00))
+       {
+           *keybuf_word = JOHNSWAP((temp & 0xffff0000) | (0x80 << 8));
+           len+=2;
+           goto key_cleaning;
+       }
+       if (!(temp & 0xff))
+       {
+           *keybuf_word = JOHNSWAP(temp | 0x80U);
+           len+=3;
+           goto key_cleaning;
+       }
+       *keybuf_word = JOHNSWAP(temp);
+#endif
        len += 4;
        keybuf_word += SIMD_COEF_32;
    }
@@ -426,10 +454,16 @@ static int cmp_exact(char *source, int index)
    MD5_Update(&ctx, key, strlen(key));
    MD5_Final((void*)crypt_key, &ctx);

-#ifdef REVERSE_STEPS
-   md5_reverse(crypt_key);
+#if ARCH_LITTLE_ENDIAN==0
+   alter_endianity(crypt_key, 16);
 #endif
+// dump_stuff(crypt_key, 16);
+// dump_stuff(get_binary(source), 16);
+#ifdef REVERSE_STEPS
+   return !memcmp(&((uint32_t*)(get_binary(source)))[1], &crypt_key[1], DIGEST_SIZE-4);
+#else
    return !memcmp(get_binary(source), crypt_key, DIGEST_SIZE);
+#endif
 #else
    return 1;
 #endif

This code is NOT checked in, because I have probably broken the non-simd BE porting. I will have to investigate that, and possibly use this simplified method.

jfoug commented 7 years ago

Getting this to work on BE without SIMD should be easy. Simply change the code removal in binary() and source(). Then do the byte swapping if building for a non-SIMD BE system. for BE-SIMD systems, we do not swap here. We will have the final crypt value in machine layout, so we want to keep the binary in machine layout, so it is easy to match (same for source, since the binary was not swapped). But for non-SIMD BE systems, within the binary_hash and cmp() we want the data swapped. so that it is returned to proper LE format (since the oSSL code will put the results back into LE format).

jfoug commented 7 years ago

Here are timings (take with grain of salt, since running in a QEMU emulation VM)

root@debian-local:~/bleed-nonSIMD/src# ../run/john -test=3 -form=raw-md5
Benchmarking: Raw-MD5 [MD5 32/32]... DONE
Raw:    401280 c/s real, 401280 c/s virtual

root@debian-local:~/bleed/src# ../run/john -test=3 -form=raw-md5
Benchmarking: Raw-MD5 [MD5 128/128 AltiVec 4x]... DONE
Raw:    1468K c/s real, 1473K c/s virtual

The non-SIMD is 32 bit, the AltiVec is 64 bit But this still shows 3.6x improvment (4X SIMD), so it appears to be running fairly well.

jfoug commented 7 years ago

I now have LE and BE algorithm examples working (raw-md4/5 and raw-sha1) in #2888

Its all good, just a LOT of work to get these all ported. It is going to get much harder on some of the formats, simply figuring out just what needs swapped, and being able to do so without impacting any other builds. there are a LOT of SIMD stuff. The ugly stuff, like dynamic and some of the special 'include' pbkdf2 type stuff will likely not be fun, but once figured out, will probably not be that big of a deal

jfoug commented 7 years ago

Btw, I have been getting 3.6x improvements. Lei was getting only about 3.2x I would BET that his work was before the new set_key() logic. That extra 11% sounds about like the gains received in the new set_key handling data 4 bytes at a time, and why @magnum moved to using that logic.

solardiz commented 7 years ago

That's a fine bet, but of course you can't compare even relative speeds on real hardware vs. VMs with that kind of precision. Things vary a lot even between hardware platforms. Often the scalar peak instruction issue rate is higher than SIMD's (e.g., 4 vs. 3 on Haswell). For example, I saw something like 42M c/s in a 64-bit scalar build vs. 59M c/s in the AltiVec build for descrypt on gcc110 - that's not even a 2x speedup from SIMD on that platform, although much greater speedup was seen on Power Macs in 2005 or so. This really varies between CPUs a lot. Perhaps there's some room for tuning of descrypt on that system for better AltiVec performance, though.

jfoug commented 7 years ago

true enough.

Well, I will get the other 32 bit BE raw formats done, and then turn my eyes to the 64 bit BE formats (SHA2 stuff) Once the raw-* stuff is done, I might have a look at the pbkdf2 include magic. That is used by a lot of formats, so if things work out, it may kill a lot of birds with just getting those includes done properly.

jfoug commented 7 years ago

getting the pbkdf2-hmac-sha1.h fixed (all other pbkdf2-hmac-.h files already were BE ready), got a lot of formats working. I still have about 80 formats left, but this now looks MUCH more manageable than having 150-200 formats needing ported. The nice thing about when I originally wrote the pbkdf2-hmac-.h helpers, is that you get it rignt, and all formats 'work', YET it is very very fast (near optimal). there are still a few formats using PBKDF2 which are failing. But most of those use PBKDF2 t generate a session key, and it is almost certain, that the key generated will require swapping. But hopefully a large part of those formats will be easy. There already is swapping code in the non-SIMD logic path. It just has to also be in this new path. IIRC, the pbkdf2 include code 'may' have a param which avoids the final swap. If that is the case, then many of these may work by simply changing a calling param when in BE

jfoug commented 7 years ago

But I do need to get on this bug also, and get a proper configure (and possibly changes to some core building), so that the AltiVec (and PPC64 building 32 bit) works fine, on a generic ./configure && make -sj8 && ../run/john -test-full=0 manner

@kholia already is asking just wtf I did to get things to work. Well, on the configure he is doing, I simply changed arch.h->ppc64.h to arch.h->ppc32.h (since it is building with a 32 bit compiler) that really boils down to the configure NOT using the 64 bit compiler (if it is installed). I had to add a CFLAGS="-O2 -m64" to get the 64 bit stuff. I would rather be forced to include CFLAGS="-O2 -m32" to get a 32 bit build on this box, and have the 64 bit be the default, which is what should be the way things work.

jfoug commented 7 years ago

Not sure if this bug is also linked to this alignment bug, but if not https://github.com/magnumripper/JohnTheRipper/issues/2868 also is part of this porting.

We have had 2 long standing assumptions when it came to SIMD building.

All SIMD code is LE
All SIMD builds do not require alignment (exception is the SIMD buffer accesses).

Both of those assumptions are no longer true, and both are being addressed. It is overall somewhat trivial porting code, BUT there is a lot of it. Also, 'SOME' of the porting code is not so trivial, and requires a lot of dump_stuff() and dump_stuff_mmx() calls to actually find out what really requires alignment swapping.

jfoug commented 7 years ago

@solardiz Can you try building again with this patch applied? The patch gets my QEMU ppc64BE working just fine. I do have to add a special -I and -L to get gmp working, but that is likely just my setup. I hope this would work on got GCC compile machines (I still do not have an account)

Things changed:

logic was only adding altivec for powerpc64le Now it adds it to both.
a special ppc64 check done early, and -m64 added to all flags (C, LD and AS flags) NOTE there may be other systems which we need to kick up the 64 bit flags to get the compiler ready.
I moved the 32/64 bit logic much higher up in the configure run. The -m64 was required on my QEMU install, and it was NOT guessing right sizes without it. Now the -m64 is added early, so that all testing is done with 64 bit machine.
if building 64 bit, I check for some standard 64 bit lib locations (/usr/local/lib64 /usr/lib64 /lib64) I did not add in 'extra' include paths into CFLAGS

diff --git a/src/configure b/src/configure
index 853305923..e38f637bd 100755
--- a/src/configure
+++ b/src/configure
@@ -3419,6 +3419,12 @@ ac_config_headers="$ac_config_headers autoconfig.h"
 # This might be a Bad Idea[tm] if cross compiling.

+# @synopsis SET_64_INCLUDES
+# @summary check and set some 64 bit includes
+# This might be a Bad Idea[tm] if cross compiling.
+
+
+
 # @synopsis SET_NORMAL_SSL_INCLUDES(base path)
 # @summary check and set include/library paths for OpenSSL
 # This might be a Bad Idea[tm] if cross compiling.
@@ -6600,6 +6606,175 @@ $as_echo "$as_me: Unable to validate $CC command line arguments. CFLAGS may need

 fi

+# early check of powerpc64. If we find power64, add -m64 to CFLAGS, LDFLAGS and ASFLAGS
+# NOTE, we may want to perform other early checks here, and then followed with check of
+# 64/32 bit, then followed by setting extra 64 bit lib paths.
+case "$host_cpu" in
+  powerpc64*) CFLAGS_EX=""
+     if test "1" = 1; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking if $CC supports -m64" >&5
+$as_echo_n "checking if $CC supports -m64... " >&6; }
+fi
+  ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+  ac_saved_cflags="$CFLAGS"
+  CFLAGS="-Werror -m64"
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  if test "1" = 1; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+fi
+      CFLAGS_EX="$CFLAGS_EX -m64"
+
+else
+  if test "1" = 1; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+  CFLAGS="$ac_saved_cflags"
+  ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+
+   if test "x${CFLAGS_EX}" != x ; then
+       LDFLAGS="-m64 $LDFLAGS"
+       CFLAGS="-m64 $CFLAGS"
+       ASFLAGS="-m64 $ASFLAGS"
+   fi
+  ;;
+esac
+
+# Cross compile compliant 32/64 bit test code.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for 32/64 bit" >&5
+$as_echo_n "checking for 32/64 bit... " >&6; }
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+extern void exit(int);
+      int main() {
+      #if defined(_LP64) || defined(__LP64__) || defined(_LLP64) || defined(__LLP64__) || \
+        defined(__x86_64) || defined(__x86_64__) || defined(__amd64) || defined(__amd64__) || \
+        defined(_M_AMD64) || defined(_M_X64) || defined(WIN64) || \
+        defined(__IA64__) || defined(__ia64) || defined(_M_IA64) || \
+        defined(__aarch64__) || defined(__ppc64__)
+          exit(0);}
+      #else
+          BORK!
+      #endif
+
+
+
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  CPU_BITS="-m64"
+   CPU_BIT_STR="64"
+   { $as_echo "$as_me:${as_lineno-$LINENO}: result: 64-bit" >&5
+$as_echo "64-bit" >&6; }
+
+else
+  CPU_BITS="-m32"
+   CPU_BIT_STR="32"
+   { $as_echo "$as_me:${as_lineno-$LINENO}: result: 32-bit" >&5
+$as_echo "32-bit" >&6; }
+
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+
+if test "x${CPU_BITS}" = x-m64 ; then
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking additional paths" >&5
+$as_echo_n "checking additional paths... " >&6; }
+  ADD_LDFLAGS=""
+  ADD_CFLAGS=""
+if test -d /usr/local/lib64; then
+   ADD_LDFLAGS="$ADD_LDFLAGS -L/usr/local/lib64"
+fi
+if test -d /usr/lib64; then
+   ADD_LDFLAGS="$ADD_LDFLAGS -L/usr/lib64"
+fi
+if test -d /lib64; then
+   ADD_LDFLAGS="$ADD_LDFLAGS -L/lib64"
+fi
+
+   for i in $ADD_CFLAGS; do
+      jtr_list_add_dupe=0
+      for j in $CPPFLAGS; do
+         if test "x$i" = "x$j"; then
+            jtr_list_add_dupe=1
+            break
+         fi
+      done
+      if test $jtr_list_add_dupe = 0; then
+         CPPFLAGS="$CPPFLAGS $i"
+         jtr_list_add_result="$jtr_list_add_result $i"
+      fi
+   done
+ # no typo here
+jtr_list_add_result=""
+
+   for i in $ADD_LDFLAGS; do
+      jtr_list_add_dupe=0
+      for j in $LDFLAGS; do
+         if test "x$i" = "x$j"; then
+            jtr_list_add_dupe=1
+            break
+         fi
+      done
+      if test $jtr_list_add_dupe = 0; then
+         LDFLAGS="$LDFLAGS $i"
+         jtr_list_add_result="$jtr_list_add_result $i"
+      fi
+   done
+
+
+   for i in $ADD_CFLAGS; do
+      jtr_list_add_dupe=0
+      for j in $CFLAGS; do
+         if test "x$i" = "x$j"; then
+            jtr_list_add_dupe=1
+            break
+         fi
+      done
+      if test $jtr_list_add_dupe = 0; then
+         CFLAGS="$CFLAGS $i"
+         jtr_list_add_result="$jtr_list_add_result $i"
+      fi
+   done
+
+
+   if test -z "$jtr_list_add_result"; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: none" >&5
+$as_echo "none" >&6; }
+else
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $jtr_list_add_result" >&5
+$as_echo "$jtr_list_add_result" >&6; }
+fi
+   jtr_list_add_result=""
+
+
+fi
+
 # Checks for programs.
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ln -s works" >&5
 $as_echo_n "checking whether ln -s works... " >&6; }
@@ -7998,7 +8173,7 @@ _ACEOF
 # pa-risc.h
 # ppc32.h
 # ppc32alt.h (-maltivec)
-# ppc64.h (-m64)
+# ppc64.h (-m64 -maltivec)
 # ppc64alt.h (-maltivec -faltivec)
 # sparc32.h
 # sparc64.h (-m64 -mcpu=ultrasparc) (-xarch=native64)
@@ -8118,10 +8293,14 @@ CPU_BEST_FLAGS="-no-opt-prefetch $CPU_BEST_FLAGS"
   pdp*) ARCH_LINK=autoconf_arch.h endian=little ;;
   powerpc64le) ARCH_LINK=ppc64.h endian=little

-CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector"
+CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector -m64"
+
+   ;;
+  powerpc64*) ARCH_LINK=ppc64.h endian=big
+
+CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector -m64"

-              ;;
-  powerpc64*) ARCH_LINK=ppc64.h endian=big ;;
+   ;;
   powerpcle) ARCH_LINK=ppc32.h endian=little ;;
   powerpc*) ARCH_LINK=ppc32.h endian=big ;;
   sparc64) ARCH_LINK=sparc64.h endian=big ;;
@@ -9003,42 +9182,6 @@ rm -f core conftest.err conftest.$ac_objext \
   CC="$CC_BACKUP"
 fi

-# Cross compile compliant 32/64 bit test code.
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for 32/64 bit" >&5
-$as_echo_n "checking for 32/64 bit... " >&6; }
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-extern void exit(int);
-      int main() {
-      #if defined(_LP64) || defined(__LP64__) || defined(_LLP64) || defined(__LLP64__) || \
-        defined(__x86_64) || defined(__x86_64__) || defined(__amd64) || defined(__amd64__) || \
-        defined(_M_AMD64) || defined(_M_X64) || defined(WIN64) || \
-        defined(__IA64__) || defined(__ia64) || defined(_M_IA64) || \
-        defined(__aarch64__) || defined(__ppc64__)
-          exit(0);}
-      #else
-          BORK!
-      #endif
-
-
-
-_ACEOF
-if ac_fn_c_try_link "$LINENO"; then :
-  CPU_BITS="-m64"
-   CPU_BIT_STR="64"
-   { $as_echo "$as_me:${as_lineno-$LINENO}: result: 64-bit" >&5
-$as_echo "64-bit" >&6; }
-
-else
-  CPU_BITS="-m32"
-   CPU_BIT_STR="32"
-   { $as_echo "$as_me:${as_lineno-$LINENO}: result: 32-bit" >&5
-$as_echo "32-bit" >&6; }
-
-fi
-rm -f core conftest.err conftest.$ac_objext \
-    conftest$ac_exeext conftest.$ac_ext
-
 # At this point we know the arch and CPU width so we can pick details. Most
 # "special stuff" from old fat Makefile should go here.
 case "${host_cpu}_${CFLAGS}" in
@@ -9067,7 +9210,7 @@ fi
    mic*)
       CC_ASM_OBJS="simd-intrinsics.o"
       ;;
-   powerpc64le*)
+   powerpc64*)
       CC_ASM_OBJS="simd-intrinsics.o"
       ;;
    arm*)
diff --git a/src/configure.ac b/src/configure.ac
index 2cbaf1c6a..2992afd3f 100644
--- a/src/configure.ac
+++ b/src/configure.ac
@@ -251,6 +251,49 @@ fi
    AC_MSG_NOTICE([Unable to validate $CC command line arguments. CFLAGS may need to be passed to ./configure for proper build])
 ])

+# early check of powerpc64. If we find power64, add -m64 to CFLAGS, LDFLAGS and ASFLAGS
+# NOTE, we may want to perform other early checks here, and then followed with check of
+# 64/32 bit, then followed by setting extra 64 bit lib paths.
+case "$host_cpu" in
+  powerpc64*) CFLAGS_EX=""
+   JTR_FLAG_CHECK([-m64], 1)
+   if test "x${CFLAGS_EX}" != x ; then
+       LDFLAGS="-m64 $LDFLAGS"
+       CFLAGS="-m64 $CFLAGS"
+       ASFLAGS="-m64 $ASFLAGS"
+   fi
+  ;;
+esac
+
+# Cross compile compliant 32/64 bit test code.
+AC_MSG_CHECKING([for 32/64 bit])
+AC_LINK_IFELSE(
+   [AC_LANG_SOURCE(
+      [extern void exit(int);
+      int main() {
+      #if defined(_LP64) || defined(__LP64__) || defined(_LLP64) || defined(__LLP64__) || \
+        defined(__x86_64) || defined(__x86_64__) || defined(__amd64) || defined(__amd64__) || \
+        defined(_M_AMD64) || defined(_M_X64) || defined(WIN64) || \
+        defined(__IA64__) || defined(__ia64) || defined(_M_IA64) || \
+        defined(__aarch64__) || defined(__ppc64__)
+          exit(0);}
+      #else
+          BORK!
+      #endif
+      ]
+   )]
+  ,[CPU_BITS="-m64"]
+   [CPU_BIT_STR="64"]
+   [AC_MSG_RESULT([64-bit])]
+  ,[CPU_BITS="-m32"]
+   [CPU_BIT_STR="32"]
+   [AC_MSG_RESULT([32-bit])]
+)
+
+if test "x${CPU_BITS}" = x-m64 ; then
+   JTR_SET_64_INCLUDES
+fi
+
 # Checks for programs.
 AC_PROG_LN_S
 AC_PROG_GREP
@@ -338,7 +381,7 @@ dnl AC_CHECK_SIZEOF([int *function()]
 # pa-risc.h
 # ppc32.h
 # ppc32alt.h (-maltivec)
-# ppc64.h (-m64)
+# ppc64.h (-m64 -maltivec)
 # ppc64alt.h (-maltivec -faltivec)
 # sparc32.h
 # sparc64.h (-m64 -mcpu=ultrasparc) (-xarch=native64)
@@ -373,9 +416,11 @@ case "$host_cpu" in
   mips*) ARCH_LINK=mips32.h endian=big ;;
   pdp*) ARCH_LINK=autoconf_arch.h endian=little ;;
   powerpc64le) ARCH_LINK=ppc64.h endian=little
-              JTR_PPC_SPECIAL_LOGIC
-              ;;
-  powerpc64*) ARCH_LINK=ppc64.h endian=big ;;
+   JTR_PPC64_SPECIAL_LOGIC
+   ;;
+  powerpc64*) ARCH_LINK=ppc64.h endian=big
+   JTR_PPC64_SPECIAL_LOGIC
+   ;;
   powerpcle) ARCH_LINK=ppc32.h endian=little ;;
   powerpc*) ARCH_LINK=ppc32.h endian=big ;;
   sparc64) ARCH_LINK=sparc64.h endian=big ;;
diff --git a/src/m4/jtr_generic_logic.m4 b/src/m4/jtr_generic_logic.m4
index d0f3c8e0f..064a969b7 100644
--- a/src/m4/jtr_generic_logic.m4
+++ b/src/m4/jtr_generic_logic.m4
@@ -51,31 +51,6 @@ if test "x$enable_native_march" != xno -a "x$osx_assembler_warn" != xyes; then
   CC="$CC_BACKUP"
 fi

-# Cross compile compliant 32/64 bit test code.
-AC_MSG_CHECKING([for 32/64 bit])
-AC_LINK_IFELSE(
-   [AC_LANG_SOURCE(
-      [extern void exit(int);
-      int main() {
-      #if defined(_LP64) || defined(__LP64__) || defined(_LLP64) || defined(__LLP64__) || \
-        defined(__x86_64) || defined(__x86_64__) || defined(__amd64) || defined(__amd64__) || \
-        defined(_M_AMD64) || defined(_M_X64) || defined(WIN64) || \
-        defined(__IA64__) || defined(__ia64) || defined(_M_IA64) || \
-        defined(__aarch64__) || defined(__ppc64__)
-          exit(0);}
-      #else
-          BORK!
-      #endif
-      ]
-   )]
-  ,[CPU_BITS="-m64"]
-   [CPU_BIT_STR="64"]
-   [AC_MSG_RESULT([64-bit])]
-  ,[CPU_BITS="-m32"]
-   [CPU_BIT_STR="32"]
-   [AC_MSG_RESULT([32-bit])]
-)
-
 # At this point we know the arch and CPU width so we can pick details. Most
 # "special stuff" from old fat Makefile should go here.
 case "${host_cpu}_${CFLAGS}" in
@@ -103,7 +78,7 @@ case "${host_cpu}_${CFLAGS}" in
    mic*)
       [CC_ASM_OBJS="simd-intrinsics.o"]
       ;;
-   powerpc64le*)
+   powerpc64*)
       [CC_ASM_OBJS="simd-intrinsics.o"]
       ;;
    arm*)
diff --git a/src/m4/jtr_ppc_logic.m4 b/src/m4/jtr_ppc_logic.m4
index ccf74b685..0fd79a188 100644
--- a/src/m4/jtr_ppc_logic.m4
+++ b/src/m4/jtr_ppc_logic.m4
@@ -3,6 +3,6 @@ dnl modification, are permitted.
 dnl
 dnl Special compiler flags for Power.

-AC_DEFUN([JTR_PPC_SPECIAL_LOGIC], [
-CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector"
+AC_DEFUN([JTR_PPC64_SPECIAL_LOGIC], [
+CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector -m64"
 ])
diff --git a/src/m4/jtr_utility_macros.m4 b/src/m4/jtr_utility_macros.m4
index 74b33bf65..a7003107c 100644
--- a/src/m4/jtr_utility_macros.m4
+++ b/src/m4/jtr_utility_macros.m4
@@ -90,6 +90,31 @@ JTR_LIST_ADD(CFLAGS, [$ADD_CFLAGS])
 JTR_LIST_ADD_RESULT
 ])

+# @synopsis SET_64_INCLUDES
+# @summary check and set some 64 bit includes
+# This might be a Bad Idea[tm] if cross compiling.
+AC_DEFUN([JTR_SET_64_INCLUDES],
+[
+  AC_MSG_CHECKING([additional paths])
+  ADD_LDFLAGS=""
+  ADD_CFLAGS=""
+if test -d /usr/local/lib64; then
+   ADD_LDFLAGS="$ADD_LDFLAGS -L/usr/local/lib64"
+fi
+if test -d /usr/lib64; then
+   ADD_LDFLAGS="$ADD_LDFLAGS -L/usr/lib64"
+fi
+if test -d /lib64; then
+   ADD_LDFLAGS="$ADD_LDFLAGS -L/lib64"
+fi
+JTR_LIST_ADD(CPPFLAGS, [$ADD_CFLAGS]) # no typo here
+jtr_list_add_result=""
+JTR_LIST_ADD(LDFLAGS, [$ADD_LDFLAGS])
+JTR_LIST_ADD(CFLAGS, [$ADD_CFLAGS])
+JTR_LIST_ADD_RESULT
+])
+
+
 # @synopsis SET_NORMAL_SSL_INCLUDES(base path)
 # @summary check and set include/library paths for OpenSSL
 # This might be a Bad Idea[tm] if cross compiling.

jfoug commented 7 years ago

I am also looking at making this change to the above code (configure.ac) This just makes sure that on 64 bit builds, IF the compiler handles the -m64, that we insert that flag into the 3 needed flags PRIOR to the real workhorse code being done during configure run.

-# early check of powerpc64. If we find power64, add -m64 to CFLAGS, LDFLAGS and ASFLAGS
-# NOTE, we may want to perform other early checks here, and then followed with check of
-# 64/32 bit, then followed by setting extra 64 bit lib paths.
-case "$host_cpu" in
-  powerpc64*) CFLAGS_EX=""
-   JTR_FLAG_CHECK([-m64], 1)
-   if test "x${CFLAGS_EX}" != x ; then
-       LDFLAGS="-m64 $LDFLAGS"
-       CFLAGS="-m64 $CFLAGS"
-       ASFLAGS="-m64 $ASFLAGS"
-   fi
-  ;;
-esac
+# early check of 64 bit systems.
+case "$host_cpu" in
+  ia64|mips64|mips64eb|mipseb64|mips64el|mipsel64|mips64*|powerpc64*|sparc64|x86_64) CFLAGS_EX=""
+   JTR_FLAG_CHECK([-m64], 1)
+   if test "x${CFLAGS_EX}" != x ; then
+       LDFLAGS="-m64 $LDFLAGS"
+       CFLAGS="-m64 $CFLAGS"
+       ASFLAGS="-m64 $ASFLAGS"
+   fi
+  ;;
+  *)
+   AC_MSG_CHECKING([if gcc supports -m64])
+   AC_MSG_RESULT([no])
+  ;;
+esac

kholia commented 7 years ago

@jfoug Can you please push this patch to a topic branch in your repository? It seems that copy-pasting patches from the diffs posted on GitHub doesn't work all the time.

jfoug commented 6 years ago

@solardiz please check out #2942 and make sure it works properly on the GCC compile 10 machine (the power64 box)

jfoug commented 6 years ago

@solardiz Looking at Makefile.legacy, we do have a ppc64 and ppc64 altvec target. They use different arch.h headers (thats ok, configure should use just one).

The question I have, is the ppc64alt.h setup PROPERLY for altivec? If so, I will adjust ppc64.h so that if built by Makefile.legacy, it will still behave the same, but I will copy in data needed into ppc64.h to make it 'appear' like the proper header file (the ppc64alt.h) IF building under autoconf. I have added logic to output a HAVE_ALTIVEC along with the HAVE_Powerpc64 (if ALTIVEC is not present). I can use those defines within ppc64.h to get things right. I just need to know if ppc64alt.h is 'right'

jfoug commented 6 years ago

One note on #2942 I will work to get SIMDPARA* things right, BUT we really need to have someone with access to real ppc64 iron to validate that the right PARA values are used.

magnumripper commented 6 years ago

/me thinks you should set all of them to 1 until we can test on the real deal.

jfoug commented 6 years ago

I have tested 1, 2, 3 (testing 4). Speeds are gradually getting higher for all hashes. I am hopeful there will be some top cap.

But there was a really big improvment on some formats (30% or so) going from 1 to 2. BUT this IS only on a VM, so we really need hardware to know if the right choices are made.

here are the current speeds I have seen

SIMD_COEF==1
  SHA1  -95822
  SHA256-141994
  SHA512-66928
  MD4   -308773
  MD5   -168333

SIMD_COEF==2
  SHA1  -112693
  SHA256-174625
  SHA512-87587
  MD4   -359952
  MD5   -210080

SIMD_COEF==3
  SHA1  -121786
  SHA256-179625
  SHA512-90112
  MD4   -378200
  MD5   -222120

BEST_PARA (on my PPC64 QEMU vm):
  md5=3
  md4=3
  sha1=3
  sha256=3
  sha512=3

jfoug commented 6 years ago

I believe now that #2942 is complete. Yes, we need to get ppc64.h correct (SIMDPARA* values), but that can easily be done later. The configure (and other items) porting is done, and passed all CI testing.

magnumripper commented 6 years ago

I have tested 1, 2, 3 (testing 4). Speeds are gradually getting higher for all hashes

Running in an emulator, that says absolutely nothing about how real hardware would react. Safest bet is setting it to 1.

solardiz commented 6 years ago

The question I have, is the ppc64alt.h setup PROPERLY for altivec?

Yes, core tree's ppc64alt.h works correctly on this gcc110 machine. (And this confirms it's normally 64-bit BE, despite of what we just saw in msr on the confused build of your ppc64 tree.)

jfoug commented 6 years ago

I am getting closer now (but further away in some circumstances, lol). I have fully ported ppc64.h and ppc32.h, so that on they behave like this:

on core or legacy, ppc64.h and ppc32.h will be used on non-altivec builds. ppc64alt.h and ppc32alt.h are used for altivec builds.

Then in ./configure builds, I call special functions for a 4 ppc variants (powerpc32, powerpc32le, powerpc64 and powerpc64le). There are 2 separate autoconf macros for this (one for 32 bit, and one for 64 bit), even though the additional command line options were the same. We may in the end find that different options are better for one or the other. Then if those macros detect altivec instructions, they set a CPU flag, listing HAVE_ALTIVEC. On autoconf builds, only the ppc32.h and ppc64.h are included.

So on a config build, with the HAVE_ALTIVEC flag either set or not, the ppc32.h will look like either like the 'legacy' ppc32.h (for non altivec builds), or ppc32alt.h for HAVE_ALTIVEC builds. The same goes for ppc64 header. Now, I may make some minor chagnes, and simply wrap the entire ppc32.h in a define, and if set, simply include the ppc32alt.h That might be a better solution, since then the code really is in only 1 place. NOTE, I think I will do that, right now.

But once done, I now am getting some format test failures (I was pretty sure I had all these worked out before, darn!). I will get them fixed, then move on to the 32 bit, and make sure that with/without altivec is happy and testing everything.

I will NOT be setting the SIMDPARA* values. Someone with real iron needs to do this. I kept increasing them, and continued to get better performance on the QEMU VM I am running under, but that is almost certainly not relative to real hardware, and just a side affect of having less overhead marshalling or something within the VM doing more of the same work.

I hope to have this working fully today.

jfoug commented 6 years ago

@solardiz I have put out this PR (mostly to expose these changes so you can more easily test against real iron)

2974

jfoug commented 6 years ago

With 90313f4 I believe that this port is now fully working. ppc32.h and ppc64.h are now tri purpose. On john core or Makefile.legacy build, they are non SIMD headers. On configure builds, they will be either the non-SIMD, or the SIMD build (of right machine word type), based upon JOHN_ALTIVEC define being set or not.

NOTE ALTIVEC2 is not addressed. NOTE SIMDPARA* values are all set to 1, and need someone with real HW to tune them properly NOTE in configure, altivec is built, IF the compiler supports it. This would cause issues cross compiling. NOTE there is no CPUID work done either in configure script OR in john proper. Thus the builds are 'blind', AND there is no way to build fallback CPU builds. NOTE we handle altivec in both 32 and 64 bit builds (I had to fix the psudeo-intrinsics.h file) NOTE code passes UBSan and ASAN (but test was on LE system). As slow as my PPC64 system is, I really do NOT want to be forced to test ASAN or UBSan. that would probably take all day per test.

TODO 1. tune SIMDPARA* values TODO 2. look into CPUID code. This would be a good place to also handle the ALTIVEC2 presence. TODO 3. add ALTIVEC2

@solardiz do we want to close this task (for all purposes, it really IS done, the rest is tuning) ?

solardiz commented 6 years ago

Thanks, Jim. On gcc110:

configure: creating ./fmt_externs.h
rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp]
 #warning ": target system requires aligned memory access, rar format disabled:"
  ^
configure: creating ./fmt_registers.h
rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp]
 #warning ": target system requires aligned memory access, rar format disabled:"
  ^

Configured for building John the Ripper jumbo:

Target CPU ................................. powerpc64 ALTIVEC, 64-bit BE
AES-NI support ............................. no
Target OS .................................. linux-gnu
Cross compiling ............................ no
Legacy arch header ......................... ppc64.h

$ time make -sj60
ar: creating aes.a
rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp]
 #warning ": target system requires aligned memory access, rar format disabled:"
  ^
stribog_fmt_plug.c:495:2: warning: #warning Stribog-256 and Stribog-512 formats require SSE 4.1, formats disabled [-Wcpp]
 #warning Stribog-256 and Stribog-512 formats require SSE 4.1, formats disabled
  ^
ar: creating secp256k1.a

Make process completed.

real    0m20.164s
user    5m17.237s
sys     0m10.452s

$ OMP_NUM_THREADS=60 ../run/john -test
Will run 60 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 128/128 AltiVec]... (60xOMP) DONE
Warning: "Many salts" test limited: 141/256
Many salts:     34309K c/s real, 859855 c/s virtual
Only one salt:  27252K c/s real, 794834 c/s virtual

Benchmarking: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 128/128 AltiVec]... (60xOMP) DONE
Speed for cost 1 (iteration count) of 725
Warning: "Many salts" test limited: 91/256
Many salts:     1397K c/s real, 24170 c/s virtual
Only one salt:  988514 c/s real, 22644 c/s virtual

Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AltiVec 4x]... (60xOMP) DONE
Raw:    492480 c/s real, 8536 c/s virtual

Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64]... (60xOMP) DONE
Speed for cost 1 (iteration count) of 32
Raw:    14435 c/s real, 253 c/s virtual

Benchmarking: scrypt (16384, 8, 1) [Salsa20/8 32/64]... (60xOMP) DONE
Speed for cost 1 (N) of 16384, cost 2 (r) of 8, cost 3 (p) of 1
Raw:    421 c/s real, 7.6 c/s virtual

Benchmarking: LM [DES 128/128 AltiVec]... (60xOMP) DONE
Raw:    54312K c/s real, 3166K c/s virtual

Benchmarking: AFS, Kerberos AFS [DES 48/64 4K]... Illegal instruction

NOTE in configure, altivec is built, IF the compiler supports it. This would cause issues cross compiling.

The configure check for AltiVec based on compiler support will probably also cause issues on older or cut-down POWER architecture chips lacking AltiVec but running a compiler that supports AltiVec. This is not ideal, but OK for now. Your separate work on an option to disable SIMD could provide a workaround for this.

As slow as my PPC64 system is

BTW, if you happen to have an old Intel Mac with OS X 10.5, you could use its pre-installed Rosetta for pretty fast emulation of PPC32 with AltiVec - I used just that previously. It feels roughly like a real PPC G4, so something like 600 MHz, on a 2 GHz Core 2 Duo. Way faster than QEMU. Apple's Xcode of the time was readily capable of cross-compiling to PPC with AltiVec - JtR proper still includes make targets for that.

solardiz commented 6 years ago

(gdb) b main
Breakpoint 1 at 0x102d03c8: file john.c, line 1891.
(gdb) r -test -form=afs
Starting program: /home/solar/bleeding-jumbo/src/../run/john -test -form=afs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, main (argc=3, argv=0x3ffffffff028) at john.c:1891
1891            sig_preinit(); /* Mitigate race conditions */
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.ppc64 glibc-2.17-196.el7.ppc64 gmp-6.0.0-15.el7.ppc64 keyutils-libs-1.5.8-3.el7.ppc64 krb5-libs-1.15.1-8.el7.ppc64 libcom_err-1.42.9-10.el7.ppc64 libgomp-4.8.5-16.el7.ppc64 libselinux-2.5-11.el7.ppc64 nss-softokn-freebl-3.28.3-8.el7_4.ppc64 openssl-libs-1.0.2k-8.el7.ppc64 pcre-8.32-17.el7.ppc64 zlib-1.2.7-17.el7.ppc64
(gdb) p/x $msr
$1 = 0x800000000002d032
(gdb) c
Continuing.
Benchmarking: AFS, Kerberos AFS [DES 48/64 4K]... 
Program received signal SIGILL, Illegal instruction.
crypt_all (pcount=<optimized out>, salt=<optimized out>) at AFS_fmt.c:322
322                     memcpy(binary.data, AFS_long_IV_binary, sizeof(binary.data));
(gdb) disass $pc-8,$pc+12
Dump of assembler code from 0x100291e0 to 0x100291f4:
   0x00000000100291e0 <crypt_all+784>:  mr      r3,r19
   0x00000000100291e4 <crypt_all+788>:  mr      r4,r12
=> 0x00000000100291e8 <crypt_all+792>:  stq     r10,672(r31)
   0x00000000100291ec <crypt_all+796>:  rldicl  r26,r0,61,63
   0x00000000100291f0 <crypt_all+800>:  std     r19,688(r31)
End of assembler dump.
(gdb) p/x $msr
$2 = 0x800000000004d032

Looks like it starts and stays in 32-bit mode. Weird. Probably I misunderstand MSR. John proper has similar MSR values, but works fine (built as linux-ppc64-altivec):

(gdb) b main
Breakpoint 1 at 0x10002114
(gdb) r -test -form=afs
Starting program: /home/solar/john-1.8.0.11-ppc/run/./john -test -form=afs

Breakpoint 1, 0x0000000010002114 in .main ()
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.ppc64 nss-softokn-freebl-3.28.3-8.el7_4.ppc64
(gdb) p/x $msr
$1 = 0x800000000002d032
(gdb) c
Continuing.
Benchmarking: AFS, Kerberos AFS [DES 48/64 4K]... ^C
Program received signal SIGINT, Interrupt.
0x0000000010006600 in .DES_std_crypt ()
(gdb) p/x $msr
$2 = 0x800000000000d032
(gdb) c
Continuing.
DONE
Short:  82600 c/s real, 329407 c/s virtual
Long:   1110K c/s real, 1110K c/s virtual

[Inferior 1 (process 45860) exited normally]

solardiz commented 6 years ago

There's not a single stq instruction (but plenty of std) in my build of John proper, even though the object files and binaries are "64-bit":

[solar@gcc1-power7 src]$ for f in *.o; do objdump -d $f | fgrep -w stq; done
[solar@gcc1-power7 src]$ pwd
/home/solar/john-1.8.0.11-ppc/src
[solar@gcc1-power7 src]$ file AFS_fmt.o ../run/john
AFS_fmt.o:   ELF 64-bit MSB relocatable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), not stripped
../run/john: ELF 64-bit MSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=9ff908b9c32b247da6c90e7696c6b27163ad6794, not stripped

solardiz commented 6 years ago

OK, stq is not such a basic instruction. In POWER architecture terminology, quadword is 128-bit, and one of the changes of ISA 2.07 (not supported on these CPUs) is:

"Allow lq/stq in Little-Endian Mode: Removes the restriction that the lq and stq instructions can only operate in Big-Endian mode."

However, at least per configure's detection we are in big-endian mode, so by that description I'd expect stq to work on a CPU from just prior to that architecture revision (POWER7 instead of POWER8). But maybe not.

So we probably need to look for and remove a gcc flag that enabled use of stq for this build.

solardiz commented 6 years ago

Oh, also found: "In versions of the architecture prior to 2.07, this instruction was privileged." This explains why it's failing for us - we're not in the kernel.

solardiz commented 6 years ago

There's still -mpower8-vector getting into Makefile, and the code still fails to build when I remove that. Jim, I might have misunderstood you - I thought you meant you implemented AltiVec1-only builds this time, without AltiVec2, but what we have is the opposite. While we may close this issue because the default build completes (albeit presumably only with a recent enough compiler to support AltiVec2), the resulting binary is partially non-working on anything pre-AltiVec2 and no other binary can currently be produced... so I'm not exactly comfortable closing this.

solardiz commented 6 years ago

Jim, how did you manage to make AltiVec work for me for bitslice DES in this build, given that you still enable -mpower8-vector? Or is this just luck?

Regarding MSR, I read up on it more and that only confirmed my prior understanding. So I suspect it's misreporting by gdb. Many bits in MSR have to be altered on interrupt (pages 950-951 in the 2.07 manual specify exactly which and how), but there got to be a way for the kernel to figure out the MSR value of the interrupted process. Yet maybe that info is somehow not fully available to or not used by gdb.

jfoug commented 6 years ago

-mpower8-vector

Ok, so that is only valid on ALTIVEC2? btw, I know very little about this CPU, I have simply been taking code/switches from Makefile.legacy and merging the ppc64.h / ppc64alt.h and then fixing compile/runtime problems I have seen in the emulator. Knowing the how/why/when of what is really happening in the iron, is over me (that is why I put out the call to get the SIMDPARAP* stuff done)

Ok, I add the -mpower8-vector blindly in the makefile. This was due to it being in the target for ppc in Makefile.legacy. So, what are our options here? Do we really need to wait until we get CPUID, so that I can check compile flags, and then make sure those compile flags can actually run on the iron being used?

jfoug commented 6 years ago

Jim, how did you manage to make AltiVec work for me for bitslice DES in this build, given that you still enable -mpower8-vector? Or is this just luck?

I made no change at all. Whoever ported this code at one point may have made changes, but I have touched it not at all. I did change some of the #defines (to follow ppc[32][64]alt.h) but that is it. With those build flags, it works like a champ in QEMU. NOTE, I am hosting that env on a Haswell laptop (has AVX2), so it is possible that the emulator is allowing ALTIVEC2 instructions by converting over to AVX2

Does anyone know of a good CPUID stand along (with source) for power?

magnumripper commented 6 years ago

Minor OT nitpick:

configure: creating ./fmt_externs.h
rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp]
 #warning ": target system requires aligned memory access, rar format disabled:"
  ^
configure: creating ./fmt_registers.h
rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp]
 #warning ": target system requires aligned memory access, rar format disabled:"
  ^

$ time make -sj60
ar: creating aes.a
rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp]
 #warning ": target system requires aligned memory access, rar format disabled:"
  ^

Please replace these warnings so they only appear during build and not during ./configure (the Stribog one doesn't show during configure).

jfoug commented 6 years ago

non-working on anything pre-AltiVec2 and no other binary can currently be produced... so I'm not exactly comfortable closing this.

I have the --enable-withoutsimd ready to go. I will finish up testing on that flag, but this one should allow a configure build on a ppc Altivec1 enabled machine (until we can figure it out). Also, the existing make -f Makefile.legacy linux-ppcp[64] targets could also be used. Yep, Makefile.legacy is deprecated, but that dinosaur simply does not die and go away ;)

That being said, I will see if there is something i can do within the QEMU environment to disable ALTIVEC2 capability, so that I can get the crashes like you are getting, and figure out proper requirements for probing for Altivec1 and for Altivec2.

But at this time, I think we hold this issue open a bit more. Yep it works fine for me. But I really hate putting it out if it can not run on things using older HW, unless we totally disable those SIMD instructions.

jfoug commented 6 years ago

Still OT: @magnumripper I saw that myself, and just have not gotten around to it.

Note, I had to do a little 'extra', to get the rawSHAng formats to NOT be placed into the fmt.h files when building non-simd. I have to have the arch.h files give themselves a lobotomy and be told to. so I had to add that 'do it' compile switch to the buildling lines for the fmt_.h We may want to visit how I did the new command line switch. I added a totally new one. We may want to instead drop it into CFLAGSEX or something (which would have been picked up by the gcc lines to make the fmt.h)

But as for rar, I saw the warnings also, just have not gotten to them yet. IIRC, the warnings also show up for the rawSHA*_ng.c files, so the #if logic to skip warning in fmt header builds has to be added there also.

magnumripper commented 6 years ago

I have the --enable-withoutsimd ready to go

Again OT but seriously, do not commit that until you changed it to --disable-simd. I swear I will pull your access rights to the repo 😉 😆

jfoug commented 6 years ago

do not commit that until you changed it to --disable-simd.

Imagine that. Hmm, I got feedback I was needing, and nothing bad listed, 'cept my easter-egg ;)

solardiz commented 6 years ago

Ok, I add the -mpower8-vector blindly in the makefile. This was due to it being in the target for ppc in Makefile.legacy. So, what are our options here?

-mpower8-vector isn't in Makefile of JtR proper, nor in Makefile.legacy in jumbo. I suggest you try dropping it, see the resulting build errors from missing the AltiVec2 intrinsics (SIMD 64-bit adds, I think), and rework that code not to require those intrinsics when building without that option. The result will be a build working on AltiVec1, even though it would probably run a bit slower than we otherwise could on a machine capable of AltiVec2.

openwall / john

Build failures on Linux/PPC64 #2861

2974