Closed solardiz closed 6 years ago
@magnumripper and/or @jfoug, do you have GCC Compile Farm accounts? If not, you can apply for them. I'd prefer that you figure this out and fix it. ;-)
I am setting up a QEMU PPC64 system (hopefully). I have also applied for a GCC compile farm account, but it listed the acceptance was by hand, and would take several days at least. I have not heard back on that yet. But hopefully the 8.8.0 Debian PPC64 VM will work, and if so, will provide a nice controlled environment to look into these type bugs.
Note, there is nothing in configure to detect SIMD other than x64 variants, and now a few others (like neon). NOTE, there will likely be a WHOLE lot of failure, once we go to BE SIMD code in many formats. Time will tell, for sure ;) Almost all of the work inside the SIMD defines, worked under assumption of LE byte ordering. Hopefully, this VM will emulate AltVec SIMD instructions also, and make development / testing work out much easier.
Warnings fixed: 60b9362a4 and 14dd15e4c
I have (finally) gotten an AltiVec build working (well it runs) on the ppc. It was not easy, so far.
As I expected, all (or most) formats fail, since there was assumptions made for the SIMD code that SIMD builds would be in LE format. I will see if I can easily figure this out. We should be mostly able to hide the endianity using the GETPOS macros
here are the hacks I have done so far to get this far (talking to myself here)
Made changes to configure.ac and m4/jtr_ppc.m4 Changed the original PPC macro to be PPC64LE Added a new PPC64 macro. Added this output to the PPC64 macro: CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector -m64" Also made changes to configure.ac to use this PPC64 macro
CFLAGS="-O2 -m64" ./configure
Had to apt-get install all of the power64 stuff I could
Edited aes/Makefile and aes/openssl/Makefile (adding the -m64)
Made changes to the ./Makefile, LDFLAGS (added /usr/lib64 /usr/lib)
LD_LIBRARY_PATH needed updated (/lib64:/usr/lib) Note, when I build oSSL, I screwed up and did not properly set the output patch to be /lib64, since I was not aware of where it should be.
Removed encfs format (uses some oSSL that was not compatible with my build)
Edited listconf.c removing the oSSL versioning code that is not part of my library.
I may be leaving something out, but I think that is most of it.
Now, john runs, but all AltiVec formats fail (as I expected). But this provides a basis to figure out how to make them work. Once one is completed, then most should be easy. Then there will likely be some (10% or so), that require some other smallish porting changes.
Thank you, Jim! I'd expect the DES-based formats from core to work fine with AltiVec, since they do in core. (In fact, historically JtR's bitslice DES first reached 128-bit SIMD with AltiVec and only later with SSE.) Is this not the case?
Here's john-1.8.0.11
built with make -j linux-ppc32-altivec
on gcc110:
[solar@gcc1-power7 src]$ GOMP_CPU_AFFINITY=0-63 ../run/john -te=1
Will run 64 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 128/128 AltiVec]... DONE
Many salts: 57671K c/s real, 921420 c/s virtual
Only one salt: 31405K c/s real, 801602 c/s virtual
Benchmarking: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 128/128 AltiVec]... DONE
Many salts: 1671K c/s real, 26087 c/s virtual
Only one salt: 1038K c/s real, 24066 c/s virtual
Benchmarking: md5crypt [MD5 32/32 X2]... DONE
Raw: 265264 c/s real, 4218 c/s virtual
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/32]... DONE
Raw: 16728 c/s real, 266 c/s virtual
Benchmarking: LM [DES 128/128 AltiVec]... DONE
Raw: 66963K c/s real, 3673K c/s virtual
Benchmarking: AFS, Kerberos AFS [DES 24/32 4K]... DONE
Short: 354304 c/s real, 354304 c/s virtual
Long: 845560 c/s real, 854016 c/s virtual
Benchmarking: tripcode [DES 128/128 AltiVec]... DONE
Raw: 16797K c/s real, 710825 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw: 81517K c/s real, 81517K c/s virtual
Benchmarking: crypt, generic crypt(3) [?/32]... DONE
Many salts: 1263K c/s real, 20041 c/s virtual
Only one salt: 1224K c/s real, 19138 c/s virtual
linux-ppc64-altivec
build failed. Now fixed in core with:
--- john-1.8.0.11/src/DES_bs_b.c 2016-01-19 04:23:55.000000000 +0000
+++ john-1.8.0.11-ppc/src/DES_bs_b.c 2017-11-13 17:08:04.208703158 +0000
@@ -232,7 +232,7 @@ typedef struct {
typedef vector signed int vtype;
#define vst(dst, ofs, src) \
- vec_st((src), (ofs) * sizeof(DES_bs_vector), (dst))
+ vec_st((src), (ofs) * sizeof(DES_bs_vector), (vtype *)(dst))
#define vxorf(a, b) \
vec_xor((a), (b))
Also added -fno-strict-aliasing
to OPT_INLINE
for both linux-ppc*-altivec targets.
Oh, and MD5_IMM
turned out to be more optimal here (unexpected), for both 32-bit and 64-bit. With all of these changes, 64-bit produces:
[solar@gcc1-power7 src]$ GOMP_CPU_AFFINITY=0-63 ../run/john -te=1
Will run 64 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 128/128 AltiVec]... DONE
Many salts: 59506K c/s real, 936818 c/s virtual
Only one salt: 32443K c/s real, 798440 c/s virtual
Benchmarking: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 128/128 AltiVec]... DONE
Many salts: 1605K c/s real, 25013 c/s virtual
Only one salt: 1103K c/s real, 23167 c/s virtual
Benchmarking: md5crypt [MD5 32/64 X2]... DONE
Raw: 280868 c/s real, 4431 c/s virtual
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64]... DONE
Raw: 17108 c/s real, 267 c/s virtual
Benchmarking: LM [DES 128/128 AltiVec]... DONE
Raw: 67633K c/s real, 3763K c/s virtual
Benchmarking: AFS, Kerberos AFS [DES 48/64 4K]... DONE
Short: 345344 c/s real, 345344 c/s virtual
Long: 1125K c/s real, 1125K c/s virtual
Benchmarking: tripcode [DES 128/128 AltiVec]... DONE
Raw: 17301K c/s real, 708207 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw: 79200K c/s real, 79200K c/s virtual
Benchmarking: crypt, generic crypt(3) [?/64]... DONE
Many salts: 1226K c/s real, 19400 c/s virtual
Only one salt: 1188K c/s real, 18571 c/s virtual
Except for md5crypt (where MD5_IMM made a difference), other performance changes seen here are probably mostly a random fluctuation rather than a genuine difference between 32-bit and 64-bit builds.
descrypt does not appear to use AltiVec instructions in the build I did. md5crypt does use AltiVec, and fails.
Note, I likely will not have time to dig into this until later in the week (possibly not until the weekend)
I have looked into some of the jumbo formats, and Doing this for AltiVec (or any BE SIMD), will be pretty much a full rewrite of a LOT of code. Anything dealing with any interleaved buffers will require code changes. I was hoping that simply updating the GETPOS() macro would suffice. In some instances that may be the case, but that likely will not be the rule.
This shouldn't be so difficult to fix. Lei Zhang got AltiVec working for us in 64-bit BE builds in 2015. We just seem to have broken this since, and we need to unbreak it.
http://www.openwall.com/lists/john-dev/2015/07/10/1 http://www.openwall.com/lists/john-dev/2015/07/10/2
I have gotten AltiVec code working for rawMD5 format. I will post the diff, and will also create another bug list. That bug list will be to port all SIMD formats to work properly for any build with BE/LE SIMD BE/LE non-SIMD and in either OMP or Non. That task will NOT be a small undertaking. But now with the first one done, at least I have a template for replacing the set_key optimized function, and for proceeding forward with other formats. But it is just a slow processes. A lot of dump_stuff_mmx() calls get scattered all over the place, working to figure things out. The longest time here, was finding the actual RIGHT byte layout for the format hash to work at all. Also, I am not 100% sure about REVERSE_STEPS code. I made a change to cmp_exact to simply skip the first 32 bit value (since it was already tested in the cmp_one/cmp_all tests. Here it the change (most debugging removed, only a small commented out section left)
diff --git a/src/rawMD5_fmt_plug.c b/src/rawMD5_fmt_plug.c
index 8c689d7..284b44d 100644
--- a/src/rawMD5_fmt_plug.c
+++ b/src/rawMD5_fmt_plug.c
@@ -101,8 +101,12 @@ static struct fmt_tests tests[] = {
#define PLAINTEXT_LENGTH 55
#define MIN_KEYS_PER_CRYPT NBKEYS
#define MAX_KEYS_PER_CRYPT NBKEYS
+#if ARCH_LITTLE_ENDIAN==1
#define GETPOS(i, index) ( (index&(SIMD_COEF_32-1))*4 + ((i)&(0xffffffff-3))*SIMD_COEF_32 + ((i)&3) + (unsigned int)index/SIMD_COEF_32*MD5_BUF_SIZ*4*SIMD_COEF_32 )
#else
+#define GETPOS(i, index) ( (index&(SIMD_COEF_32-1))*4 + ((i)&(0xffffffff-3))*SIMD_COEF_32 + (3-((i)&3)) + (unsigned int)index/SIMD_COEF_32*MD5_BUF_SIZ*4*SIMD_COEF_32 )
+#endif
+#else
#define PLAINTEXT_LENGTH 125
#define MIN_KEYS_PER_CRYPT 1
#define MAX_KEYS_PER_CRYPT 1
@@ -223,11 +227,7 @@ static void *get_binary(char *ciphertext)
temp |= ((unsigned int)(atoi16[ARCH_INDEX(ciphertext[i*8+7])]))<<24;
-#if ARCH_LITTLE_ENDIAN
out[i]=temp;
-#else
- out[i]=JOHNSWAP(temp);
-#endif
}
#if SIMD_COEF_32 && defined(REVERSE_STEPS)
@@ -250,10 +250,6 @@ static char *source(char *source, void *binary)
md5_unreverse(b);
#endif
-#if ARCH_LITTLE_ENDIAN==0
- alter_endianity(b, 16);
-#endif
-
p = &out[TAG_LENGTH];
for (i = 0; i < 4; i++)
for (j = 0; j < 8; j++)
@@ -278,6 +274,7 @@ static void set_key(char *_key, int index)
uint32_t temp;
len = 0;
+#if ARCH_LITTLE_ENDIAN==1
while((temp = *key++) & 0xff) {
if (!(temp & 0xff00))
{
@@ -298,6 +295,28 @@ static void set_key(char *_key, int index)
goto key_cleaning;
}
*keybuf_word = temp;
+#else
+ while((temp = *key++) & 0xff000000) {
+ if (!(temp & 0xff0000))
+ {
+ *keybuf_word = JOHNSWAP((temp & 0xff000000) | (0x80 << 16));
+ len++;
+ goto key_cleaning;
+ }
+ if (!(temp & 0xff00))
+ {
+ *keybuf_word = JOHNSWAP((temp & 0xffff0000) | (0x80 << 8));
+ len+=2;
+ goto key_cleaning;
+ }
+ if (!(temp & 0xff))
+ {
+ *keybuf_word = JOHNSWAP(temp | 0x80U);
+ len+=3;
+ goto key_cleaning;
+ }
+ *keybuf_word = JOHNSWAP(temp);
+#endif
len += 4;
keybuf_word += SIMD_COEF_32;
}
@@ -426,10 +454,16 @@ static int cmp_exact(char *source, int index)
MD5_Update(&ctx, key, strlen(key));
MD5_Final((void*)crypt_key, &ctx);
-#ifdef REVERSE_STEPS
- md5_reverse(crypt_key);
+#if ARCH_LITTLE_ENDIAN==0
+ alter_endianity(crypt_key, 16);
#endif
+// dump_stuff(crypt_key, 16);
+// dump_stuff(get_binary(source), 16);
+#ifdef REVERSE_STEPS
+ return !memcmp(&((uint32_t*)(get_binary(source)))[1], &crypt_key[1], DIGEST_SIZE-4);
+#else
return !memcmp(get_binary(source), crypt_key, DIGEST_SIZE);
+#endif
#else
return 1;
#endif
This code is NOT checked in, because I have probably broken the non-simd BE porting. I will have to investigate that, and possibly use this simplified method.
Getting this to work on BE without SIMD should be easy. Simply change the code removal in binary() and source(). Then do the byte swapping if building for a non-SIMD BE system. for BE-SIMD systems, we do not swap here. We will have the final crypt value in machine layout, so we want to keep the binary in machine layout, so it is easy to match (same for source, since the binary was not swapped). But for non-SIMD BE systems, within the binary_hash and cmp() we want the data swapped. so that it is returned to proper LE format (since the oSSL code will put the results back into LE format).
Here are timings (take with grain of salt, since running in a QEMU emulation VM)
root@debian-local:~/bleed-nonSIMD/src# ../run/john -test=3 -form=raw-md5
Benchmarking: Raw-MD5 [MD5 32/32]... DONE
Raw: 401280 c/s real, 401280 c/s virtual
root@debian-local:~/bleed/src# ../run/john -test=3 -form=raw-md5
Benchmarking: Raw-MD5 [MD5 128/128 AltiVec 4x]... DONE
Raw: 1468K c/s real, 1473K c/s virtual
The non-SIMD is 32 bit, the AltiVec is 64 bit But this still shows 3.6x improvment (4X SIMD), so it appears to be running fairly well.
I now have LE and BE algorithm examples working (raw-md4/5 and raw-sha1) in #2888
Its all good, just a LOT of work to get these all ported. It is going to get much harder on some of the formats, simply figuring out just what needs swapped, and being able to do so without impacting any other builds. there are a LOT of SIMD stuff. The ugly stuff, like dynamic and some of the special 'include' pbkdf2 type stuff will likely not be fun, but once figured out, will probably not be that big of a deal
Btw, I have been getting 3.6x improvements. Lei was getting only about 3.2x I would BET that his work was before the new set_key() logic. That extra 11% sounds about like the gains received in the new set_key handling data 4 bytes at a time, and why @magnum moved to using that logic.
That's a fine bet, but of course you can't compare even relative speeds on real hardware vs. VMs with that kind of precision. Things vary a lot even between hardware platforms. Often the scalar peak instruction issue rate is higher than SIMD's (e.g., 4 vs. 3 on Haswell). For example, I saw something like 42M c/s in a 64-bit scalar build vs. 59M c/s in the AltiVec build for descrypt on gcc110 - that's not even a 2x speedup from SIMD on that platform, although much greater speedup was seen on Power Macs in 2005 or so. This really varies between CPUs a lot. Perhaps there's some room for tuning of descrypt on that system for better AltiVec performance, though.
true enough.
Well, I will get the other 32 bit BE raw formats done, and then turn my eyes to the 64 bit BE formats (SHA2 stuff) Once the raw-* stuff is done, I might have a look at the pbkdf2 include magic. That is used by a lot of formats, so if things work out, it may kill a lot of birds with just getting those includes done properly.
getting the pbkdf2-hmac-sha1.h fixed (all other pbkdf2-hmac-.h files already were BE ready), got a lot of formats working. I still have about 80 formats left, but this now looks MUCH more manageable than having 150-200 formats needing ported. The nice thing about when I originally wrote the pbkdf2-hmac-.h helpers, is that you get it rignt, and all formats 'work', YET it is very very fast (near optimal). there are still a few formats using PBKDF2 which are failing. But most of those use PBKDF2 t generate a session key, and it is almost certain, that the key generated will require swapping. But hopefully a large part of those formats will be easy. There already is swapping code in the non-SIMD logic path. It just has to also be in this new path. IIRC, the pbkdf2 include code 'may' have a param which avoids the final swap. If that is the case, then many of these may work by simply changing a calling param when in BE
But I do need to get on this bug also, and get a proper configure (and possibly changes to some core building), so that the AltiVec (and PPC64 building 32 bit) works fine, on a generic ./configure && make -sj8 && ../run/john -test-full=0 manner
@kholia already is asking just wtf I did to get things to work. Well, on the configure he is doing, I simply changed arch.h->ppc64.h to arch.h->ppc32.h (since it is building with a 32 bit compiler) that really boils down to the configure NOT using the 64 bit compiler (if it is installed). I had to add a CFLAGS="-O2 -m64" to get the 64 bit stuff. I would rather be forced to include CFLAGS="-O2 -m32" to get a 32 bit build on this box, and have the 64 bit be the default, which is what should be the way things work.
Not sure if this bug is also linked to this alignment bug, but if not https://github.com/magnumripper/JohnTheRipper/issues/2868 also is part of this porting.
We have had 2 long standing assumptions when it came to SIMD building.
Both of those assumptions are no longer true, and both are being addressed. It is overall somewhat trivial porting code, BUT there is a lot of it. Also, 'SOME' of the porting code is not so trivial, and requires a lot of dump_stuff() and dump_stuff_mmx() calls to actually find out what really requires alignment swapping.
@solardiz Can you try building again with this patch applied? The patch gets my QEMU ppc64BE working just fine. I do have to add a special -I and -L to get gmp working, but that is likely just my setup. I hope this would work on got GCC compile machines (I still do not have an account)
Things changed:
diff --git a/src/configure b/src/configure
index 853305923..e38f637bd 100755
--- a/src/configure
+++ b/src/configure
@@ -3419,6 +3419,12 @@ ac_config_headers="$ac_config_headers autoconfig.h"
# This might be a Bad Idea[tm] if cross compiling.
+# @synopsis SET_64_INCLUDES
+# @summary check and set some 64 bit includes
+# This might be a Bad Idea[tm] if cross compiling.
+
+
+
# @synopsis SET_NORMAL_SSL_INCLUDES(base path)
# @summary check and set include/library paths for OpenSSL
# This might be a Bad Idea[tm] if cross compiling.
@@ -6600,6 +6606,175 @@ $as_echo "$as_me: Unable to validate $CC command line arguments. CFLAGS may need
fi
+# early check of powerpc64. If we find power64, add -m64 to CFLAGS, LDFLAGS and ASFLAGS
+# NOTE, we may want to perform other early checks here, and then followed with check of
+# 64/32 bit, then followed by setting extra 64 bit lib paths.
+case "$host_cpu" in
+ powerpc64*) CFLAGS_EX=""
+ if test "1" = 1; then :
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking if $CC supports -m64" >&5
+$as_echo_n "checking if $CC supports -m64... " >&6; }
+fi
+ ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+ ac_saved_cflags="$CFLAGS"
+ CFLAGS="-Werror -m64"
+ cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h. */
+
+int
+main ()
+{
+
+ ;
+ return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+ if test "1" = 1; then :
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+fi
+ CFLAGS_EX="$CFLAGS_EX -m64"
+
+else
+ if test "1" = 1; then :
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ CFLAGS="$ac_saved_cflags"
+ ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+
+ if test "x${CFLAGS_EX}" != x ; then
+ LDFLAGS="-m64 $LDFLAGS"
+ CFLAGS="-m64 $CFLAGS"
+ ASFLAGS="-m64 $ASFLAGS"
+ fi
+ ;;
+esac
+
+# Cross compile compliant 32/64 bit test code.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for 32/64 bit" >&5
+$as_echo_n "checking for 32/64 bit... " >&6; }
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h. */
+extern void exit(int);
+ int main() {
+ #if defined(_LP64) || defined(__LP64__) || defined(_LLP64) || defined(__LLP64__) || \
+ defined(__x86_64) || defined(__x86_64__) || defined(__amd64) || defined(__amd64__) || \
+ defined(_M_AMD64) || defined(_M_X64) || defined(WIN64) || \
+ defined(__IA64__) || defined(__ia64) || defined(_M_IA64) || \
+ defined(__aarch64__) || defined(__ppc64__)
+ exit(0);}
+ #else
+ BORK!
+ #endif
+
+
+
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+ CPU_BITS="-m64"
+ CPU_BIT_STR="64"
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: 64-bit" >&5
+$as_echo "64-bit" >&6; }
+
+else
+ CPU_BITS="-m32"
+ CPU_BIT_STR="32"
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: 32-bit" >&5
+$as_echo "32-bit" >&6; }
+
+fi
+rm -f core conftest.err conftest.$ac_objext \
+ conftest$ac_exeext conftest.$ac_ext
+
+if test "x${CPU_BITS}" = x-m64 ; then
+
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking additional paths" >&5
+$as_echo_n "checking additional paths... " >&6; }
+ ADD_LDFLAGS=""
+ ADD_CFLAGS=""
+if test -d /usr/local/lib64; then
+ ADD_LDFLAGS="$ADD_LDFLAGS -L/usr/local/lib64"
+fi
+if test -d /usr/lib64; then
+ ADD_LDFLAGS="$ADD_LDFLAGS -L/usr/lib64"
+fi
+if test -d /lib64; then
+ ADD_LDFLAGS="$ADD_LDFLAGS -L/lib64"
+fi
+
+ for i in $ADD_CFLAGS; do
+ jtr_list_add_dupe=0
+ for j in $CPPFLAGS; do
+ if test "x$i" = "x$j"; then
+ jtr_list_add_dupe=1
+ break
+ fi
+ done
+ if test $jtr_list_add_dupe = 0; then
+ CPPFLAGS="$CPPFLAGS $i"
+ jtr_list_add_result="$jtr_list_add_result $i"
+ fi
+ done
+ # no typo here
+jtr_list_add_result=""
+
+ for i in $ADD_LDFLAGS; do
+ jtr_list_add_dupe=0
+ for j in $LDFLAGS; do
+ if test "x$i" = "x$j"; then
+ jtr_list_add_dupe=1
+ break
+ fi
+ done
+ if test $jtr_list_add_dupe = 0; then
+ LDFLAGS="$LDFLAGS $i"
+ jtr_list_add_result="$jtr_list_add_result $i"
+ fi
+ done
+
+
+ for i in $ADD_CFLAGS; do
+ jtr_list_add_dupe=0
+ for j in $CFLAGS; do
+ if test "x$i" = "x$j"; then
+ jtr_list_add_dupe=1
+ break
+ fi
+ done
+ if test $jtr_list_add_dupe = 0; then
+ CFLAGS="$CFLAGS $i"
+ jtr_list_add_result="$jtr_list_add_result $i"
+ fi
+ done
+
+
+ if test -z "$jtr_list_add_result"; then :
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: none" >&5
+$as_echo "none" >&6; }
+else
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $jtr_list_add_result" >&5
+$as_echo "$jtr_list_add_result" >&6; }
+fi
+ jtr_list_add_result=""
+
+
+fi
+
# Checks for programs.
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ln -s works" >&5
$as_echo_n "checking whether ln -s works... " >&6; }
@@ -7998,7 +8173,7 @@ _ACEOF
# pa-risc.h
# ppc32.h
# ppc32alt.h (-maltivec)
-# ppc64.h (-m64)
+# ppc64.h (-m64 -maltivec)
# ppc64alt.h (-maltivec -faltivec)
# sparc32.h
# sparc64.h (-m64 -mcpu=ultrasparc) (-xarch=native64)
@@ -8118,10 +8293,14 @@ CPU_BEST_FLAGS="-no-opt-prefetch $CPU_BEST_FLAGS"
pdp*) ARCH_LINK=autoconf_arch.h endian=little ;;
powerpc64le) ARCH_LINK=ppc64.h endian=little
-CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector"
+CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector -m64"
+
+ ;;
+ powerpc64*) ARCH_LINK=ppc64.h endian=big
+
+CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector -m64"
- ;;
- powerpc64*) ARCH_LINK=ppc64.h endian=big ;;
+ ;;
powerpcle) ARCH_LINK=ppc32.h endian=little ;;
powerpc*) ARCH_LINK=ppc32.h endian=big ;;
sparc64) ARCH_LINK=sparc64.h endian=big ;;
@@ -9003,42 +9182,6 @@ rm -f core conftest.err conftest.$ac_objext \
CC="$CC_BACKUP"
fi
-# Cross compile compliant 32/64 bit test code.
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for 32/64 bit" >&5
-$as_echo_n "checking for 32/64 bit... " >&6; }
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h. */
-extern void exit(int);
- int main() {
- #if defined(_LP64) || defined(__LP64__) || defined(_LLP64) || defined(__LLP64__) || \
- defined(__x86_64) || defined(__x86_64__) || defined(__amd64) || defined(__amd64__) || \
- defined(_M_AMD64) || defined(_M_X64) || defined(WIN64) || \
- defined(__IA64__) || defined(__ia64) || defined(_M_IA64) || \
- defined(__aarch64__) || defined(__ppc64__)
- exit(0);}
- #else
- BORK!
- #endif
-
-
-
-_ACEOF
-if ac_fn_c_try_link "$LINENO"; then :
- CPU_BITS="-m64"
- CPU_BIT_STR="64"
- { $as_echo "$as_me:${as_lineno-$LINENO}: result: 64-bit" >&5
-$as_echo "64-bit" >&6; }
-
-else
- CPU_BITS="-m32"
- CPU_BIT_STR="32"
- { $as_echo "$as_me:${as_lineno-$LINENO}: result: 32-bit" >&5
-$as_echo "32-bit" >&6; }
-
-fi
-rm -f core conftest.err conftest.$ac_objext \
- conftest$ac_exeext conftest.$ac_ext
-
# At this point we know the arch and CPU width so we can pick details. Most
# "special stuff" from old fat Makefile should go here.
case "${host_cpu}_${CFLAGS}" in
@@ -9067,7 +9210,7 @@ fi
mic*)
CC_ASM_OBJS="simd-intrinsics.o"
;;
- powerpc64le*)
+ powerpc64*)
CC_ASM_OBJS="simd-intrinsics.o"
;;
arm*)
diff --git a/src/configure.ac b/src/configure.ac
index 2cbaf1c6a..2992afd3f 100644
--- a/src/configure.ac
+++ b/src/configure.ac
@@ -251,6 +251,49 @@ fi
AC_MSG_NOTICE([Unable to validate $CC command line arguments. CFLAGS may need to be passed to ./configure for proper build])
])
+# early check of powerpc64. If we find power64, add -m64 to CFLAGS, LDFLAGS and ASFLAGS
+# NOTE, we may want to perform other early checks here, and then followed with check of
+# 64/32 bit, then followed by setting extra 64 bit lib paths.
+case "$host_cpu" in
+ powerpc64*) CFLAGS_EX=""
+ JTR_FLAG_CHECK([-m64], 1)
+ if test "x${CFLAGS_EX}" != x ; then
+ LDFLAGS="-m64 $LDFLAGS"
+ CFLAGS="-m64 $CFLAGS"
+ ASFLAGS="-m64 $ASFLAGS"
+ fi
+ ;;
+esac
+
+# Cross compile compliant 32/64 bit test code.
+AC_MSG_CHECKING([for 32/64 bit])
+AC_LINK_IFELSE(
+ [AC_LANG_SOURCE(
+ [extern void exit(int);
+ int main() {
+ #if defined(_LP64) || defined(__LP64__) || defined(_LLP64) || defined(__LLP64__) || \
+ defined(__x86_64) || defined(__x86_64__) || defined(__amd64) || defined(__amd64__) || \
+ defined(_M_AMD64) || defined(_M_X64) || defined(WIN64) || \
+ defined(__IA64__) || defined(__ia64) || defined(_M_IA64) || \
+ defined(__aarch64__) || defined(__ppc64__)
+ exit(0);}
+ #else
+ BORK!
+ #endif
+ ]
+ )]
+ ,[CPU_BITS="-m64"]
+ [CPU_BIT_STR="64"]
+ [AC_MSG_RESULT([64-bit])]
+ ,[CPU_BITS="-m32"]
+ [CPU_BIT_STR="32"]
+ [AC_MSG_RESULT([32-bit])]
+)
+
+if test "x${CPU_BITS}" = x-m64 ; then
+ JTR_SET_64_INCLUDES
+fi
+
# Checks for programs.
AC_PROG_LN_S
AC_PROG_GREP
@@ -338,7 +381,7 @@ dnl AC_CHECK_SIZEOF([int *function()]
# pa-risc.h
# ppc32.h
# ppc32alt.h (-maltivec)
-# ppc64.h (-m64)
+# ppc64.h (-m64 -maltivec)
# ppc64alt.h (-maltivec -faltivec)
# sparc32.h
# sparc64.h (-m64 -mcpu=ultrasparc) (-xarch=native64)
@@ -373,9 +416,11 @@ case "$host_cpu" in
mips*) ARCH_LINK=mips32.h endian=big ;;
pdp*) ARCH_LINK=autoconf_arch.h endian=little ;;
powerpc64le) ARCH_LINK=ppc64.h endian=little
- JTR_PPC_SPECIAL_LOGIC
- ;;
- powerpc64*) ARCH_LINK=ppc64.h endian=big ;;
+ JTR_PPC64_SPECIAL_LOGIC
+ ;;
+ powerpc64*) ARCH_LINK=ppc64.h endian=big
+ JTR_PPC64_SPECIAL_LOGIC
+ ;;
powerpcle) ARCH_LINK=ppc32.h endian=little ;;
powerpc*) ARCH_LINK=ppc32.h endian=big ;;
sparc64) ARCH_LINK=sparc64.h endian=big ;;
diff --git a/src/m4/jtr_generic_logic.m4 b/src/m4/jtr_generic_logic.m4
index d0f3c8e0f..064a969b7 100644
--- a/src/m4/jtr_generic_logic.m4
+++ b/src/m4/jtr_generic_logic.m4
@@ -51,31 +51,6 @@ if test "x$enable_native_march" != xno -a "x$osx_assembler_warn" != xyes; then
CC="$CC_BACKUP"
fi
-# Cross compile compliant 32/64 bit test code.
-AC_MSG_CHECKING([for 32/64 bit])
-AC_LINK_IFELSE(
- [AC_LANG_SOURCE(
- [extern void exit(int);
- int main() {
- #if defined(_LP64) || defined(__LP64__) || defined(_LLP64) || defined(__LLP64__) || \
- defined(__x86_64) || defined(__x86_64__) || defined(__amd64) || defined(__amd64__) || \
- defined(_M_AMD64) || defined(_M_X64) || defined(WIN64) || \
- defined(__IA64__) || defined(__ia64) || defined(_M_IA64) || \
- defined(__aarch64__) || defined(__ppc64__)
- exit(0);}
- #else
- BORK!
- #endif
- ]
- )]
- ,[CPU_BITS="-m64"]
- [CPU_BIT_STR="64"]
- [AC_MSG_RESULT([64-bit])]
- ,[CPU_BITS="-m32"]
- [CPU_BIT_STR="32"]
- [AC_MSG_RESULT([32-bit])]
-)
-
# At this point we know the arch and CPU width so we can pick details. Most
# "special stuff" from old fat Makefile should go here.
case "${host_cpu}_${CFLAGS}" in
@@ -103,7 +78,7 @@ case "${host_cpu}_${CFLAGS}" in
mic*)
[CC_ASM_OBJS="simd-intrinsics.o"]
;;
- powerpc64le*)
+ powerpc64*)
[CC_ASM_OBJS="simd-intrinsics.o"]
;;
arm*)
diff --git a/src/m4/jtr_ppc_logic.m4 b/src/m4/jtr_ppc_logic.m4
index ccf74b685..0fd79a188 100644
--- a/src/m4/jtr_ppc_logic.m4
+++ b/src/m4/jtr_ppc_logic.m4
@@ -3,6 +3,6 @@ dnl modification, are permitted.
dnl
dnl Special compiler flags for Power.
-AC_DEFUN([JTR_PPC_SPECIAL_LOGIC], [
-CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector"
+AC_DEFUN([JTR_PPC64_SPECIAL_LOGIC], [
+CPU_BEST_FLAGS="-maltivec -mvsx -mpower8-vector -m64"
])
diff --git a/src/m4/jtr_utility_macros.m4 b/src/m4/jtr_utility_macros.m4
index 74b33bf65..a7003107c 100644
--- a/src/m4/jtr_utility_macros.m4
+++ b/src/m4/jtr_utility_macros.m4
@@ -90,6 +90,31 @@ JTR_LIST_ADD(CFLAGS, [$ADD_CFLAGS])
JTR_LIST_ADD_RESULT
])
+# @synopsis SET_64_INCLUDES
+# @summary check and set some 64 bit includes
+# This might be a Bad Idea[tm] if cross compiling.
+AC_DEFUN([JTR_SET_64_INCLUDES],
+[
+ AC_MSG_CHECKING([additional paths])
+ ADD_LDFLAGS=""
+ ADD_CFLAGS=""
+if test -d /usr/local/lib64; then
+ ADD_LDFLAGS="$ADD_LDFLAGS -L/usr/local/lib64"
+fi
+if test -d /usr/lib64; then
+ ADD_LDFLAGS="$ADD_LDFLAGS -L/usr/lib64"
+fi
+if test -d /lib64; then
+ ADD_LDFLAGS="$ADD_LDFLAGS -L/lib64"
+fi
+JTR_LIST_ADD(CPPFLAGS, [$ADD_CFLAGS]) # no typo here
+jtr_list_add_result=""
+JTR_LIST_ADD(LDFLAGS, [$ADD_LDFLAGS])
+JTR_LIST_ADD(CFLAGS, [$ADD_CFLAGS])
+JTR_LIST_ADD_RESULT
+])
+
+
# @synopsis SET_NORMAL_SSL_INCLUDES(base path)
# @summary check and set include/library paths for OpenSSL
# This might be a Bad Idea[tm] if cross compiling.
I am also looking at making this change to the above code (configure.ac) This just makes sure that on 64 bit builds, IF the compiler handles the -m64, that we insert that flag into the 3 needed flags PRIOR to the real workhorse code being done during configure run.
-# early check of powerpc64. If we find power64, add -m64 to CFLAGS, LDFLAGS and ASFLAGS
-# NOTE, we may want to perform other early checks here, and then followed with check of
-# 64/32 bit, then followed by setting extra 64 bit lib paths.
-case "$host_cpu" in
- powerpc64*) CFLAGS_EX=""
- JTR_FLAG_CHECK([-m64], 1)
- if test "x${CFLAGS_EX}" != x ; then
- LDFLAGS="-m64 $LDFLAGS"
- CFLAGS="-m64 $CFLAGS"
- ASFLAGS="-m64 $ASFLAGS"
- fi
- ;;
-esac
+# early check of 64 bit systems.
+case "$host_cpu" in
+ ia64|mips64|mips64eb|mipseb64|mips64el|mipsel64|mips64*|powerpc64*|sparc64|x86_64) CFLAGS_EX=""
+ JTR_FLAG_CHECK([-m64], 1)
+ if test "x${CFLAGS_EX}" != x ; then
+ LDFLAGS="-m64 $LDFLAGS"
+ CFLAGS="-m64 $CFLAGS"
+ ASFLAGS="-m64 $ASFLAGS"
+ fi
+ ;;
+ *)
+ AC_MSG_CHECKING([if gcc supports -m64])
+ AC_MSG_RESULT([no])
+ ;;
+esac
@jfoug Can you please push this patch to a topic branch in your repository? It seems that copy-pasting patches from the diffs posted on GitHub doesn't work all the time.
@solardiz please check out #2942 and make sure it works properly on the GCC compile 10 machine (the power64 box)
@solardiz Looking at Makefile.legacy, we do have a ppc64 and ppc64 altvec target. They use different arch.h headers (thats ok, configure should use just one).
The question I have, is the ppc64alt.h setup PROPERLY for altivec? If so, I will adjust ppc64.h so that if built by Makefile.legacy, it will still behave the same, but I will copy in data needed into ppc64.h to make it 'appear' like the proper header file (the ppc64alt.h) IF building under autoconf. I have added logic to output a HAVE_ALTIVEC along with the HAVE_Powerpc64 (if ALTIVEC is not present). I can use those defines within ppc64.h to get things right. I just need to know if ppc64alt.h is 'right'
One note on #2942 I will work to get SIMDPARA* things right, BUT we really need to have someone with access to real ppc64 iron to validate that the right PARA values are used.
/me thinks you should set all of them to 1 until we can test on the real deal.
I have tested 1, 2, 3 (testing 4). Speeds are gradually getting higher for all hashes. I am hopeful there will be some top cap.
But there was a really big improvment on some formats (30% or so) going from 1 to 2. BUT this IS only on a VM, so we really need hardware to know if the right choices are made.
here are the current speeds I have seen
SIMD_COEF==1
SHA1 -95822
SHA256-141994
SHA512-66928
MD4 -308773
MD5 -168333
SIMD_COEF==2
SHA1 -112693
SHA256-174625
SHA512-87587
MD4 -359952
MD5 -210080
SIMD_COEF==3
SHA1 -121786
SHA256-179625
SHA512-90112
MD4 -378200
MD5 -222120
BEST_PARA (on my PPC64 QEMU vm):
md5=3
md4=3
sha1=3
sha256=3
sha512=3
I believe now that #2942 is complete. Yes, we need to get ppc64.h correct (SIMDPARA* values), but that can easily be done later. The configure (and other items) porting is done, and passed all CI testing.
I have tested 1, 2, 3 (testing 4). Speeds are gradually getting higher for all hashes
Running in an emulator, that says absolutely nothing about how real hardware would react. Safest bet is setting it to 1.
The question I have, is the ppc64alt.h setup PROPERLY for altivec?
Yes, core tree's ppc64alt.h
works correctly on this gcc110
machine. (And this confirms it's normally 64-bit BE, despite of what we just saw in msr
on the confused build of your ppc64 tree.)
I am getting closer now (but further away in some circumstances, lol). I have fully ported ppc64.h and ppc32.h, so that on they behave like this:
on core or legacy, ppc64.h and ppc32.h will be used on non-altivec builds. ppc64alt.h and ppc32alt.h are used for altivec builds.
Then in ./configure builds, I call special functions for a 4 ppc variants (powerpc32, powerpc32le, powerpc64 and powerpc64le). There are 2 separate autoconf macros for this (one for 32 bit, and one for 64 bit), even though the additional command line options were the same. We may in the end find that different options are better for one or the other. Then if those macros detect altivec instructions, they set a CPU flag, listing HAVE_ALTIVEC. On autoconf builds, only the ppc32.h and ppc64.h are included.
So on a config build, with the HAVE_ALTIVEC flag either set or not, the ppc32.h will look like either like the 'legacy' ppc32.h (for non altivec builds), or ppc32alt.h for HAVE_ALTIVEC builds. The same goes for ppc64 header. Now, I may make some minor chagnes, and simply wrap the entire ppc32.h in a define, and if set, simply include the ppc32alt.h That might be a better solution, since then the code really is in only 1 place. NOTE, I think I will do that, right now.
But once done, I now am getting some format test failures (I was pretty sure I had all these worked out before, darn!). I will get them fixed, then move on to the 32 bit, and make sure that with/without altivec is happy and testing everything.
I will NOT be setting the SIMDPARA* values. Someone with real iron needs to do this. I kept increasing them, and continued to get better performance on the QEMU VM I am running under, but that is almost certainly not relative to real hardware, and just a side affect of having less overhead marshalling or something within the VM doing more of the same work.
I hope to have this working fully today.
@solardiz I have put out this PR (mostly to expose these changes so you can more easily test against real iron)
With 90313f4 I believe that this port is now fully working. ppc32.h and ppc64.h are now tri purpose. On john core or Makefile.legacy build, they are non SIMD headers. On configure builds, they will be either the non-SIMD, or the SIMD build (of right machine word type), based upon JOHN_ALTIVEC define being set or not.
NOTE ALTIVEC2 is not addressed. NOTE SIMDPARA* values are all set to 1, and need someone with real HW to tune them properly NOTE in configure, altivec is built, IF the compiler supports it. This would cause issues cross compiling. NOTE there is no CPUID work done either in configure script OR in john proper. Thus the builds are 'blind', AND there is no way to build fallback CPU builds. NOTE we handle altivec in both 32 and 64 bit builds (I had to fix the psudeo-intrinsics.h file) NOTE code passes UBSan and ASAN (but test was on LE system). As slow as my PPC64 system is, I really do NOT want to be forced to test ASAN or UBSan. that would probably take all day per test.
TODO 1. tune SIMDPARA* values TODO 2. look into CPUID code. This would be a good place to also handle the ALTIVEC2 presence. TODO 3. add ALTIVEC2
@solardiz do we want to close this task (for all purposes, it really IS done, the rest is tuning) ?
Thanks, Jim. On gcc110:
configure: creating ./fmt_externs.h
rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp]
#warning ": target system requires aligned memory access, rar format disabled:"
^
configure: creating ./fmt_registers.h
rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp]
#warning ": target system requires aligned memory access, rar format disabled:"
^
Configured for building John the Ripper jumbo:
Target CPU ................................. powerpc64 ALTIVEC, 64-bit BE
AES-NI support ............................. no
Target OS .................................. linux-gnu
Cross compiling ............................ no
Legacy arch header ......................... ppc64.h
$ time make -sj60
ar: creating aes.a
rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp]
#warning ": target system requires aligned memory access, rar format disabled:"
^
stribog_fmt_plug.c:495:2: warning: #warning Stribog-256 and Stribog-512 formats require SSE 4.1, formats disabled [-Wcpp]
#warning Stribog-256 and Stribog-512 formats require SSE 4.1, formats disabled
^
ar: creating secp256k1.a
Make process completed.
real 0m20.164s
user 5m17.237s
sys 0m10.452s
$ OMP_NUM_THREADS=60 ../run/john -test
Will run 60 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 128/128 AltiVec]... (60xOMP) DONE
Warning: "Many salts" test limited: 141/256
Many salts: 34309K c/s real, 859855 c/s virtual
Only one salt: 27252K c/s real, 794834 c/s virtual
Benchmarking: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES 128/128 AltiVec]... (60xOMP) DONE
Speed for cost 1 (iteration count) of 725
Warning: "Many salts" test limited: 91/256
Many salts: 1397K c/s real, 24170 c/s virtual
Only one salt: 988514 c/s real, 22644 c/s virtual
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AltiVec 4x]... (60xOMP) DONE
Raw: 492480 c/s real, 8536 c/s virtual
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64]... (60xOMP) DONE
Speed for cost 1 (iteration count) of 32
Raw: 14435 c/s real, 253 c/s virtual
Benchmarking: scrypt (16384, 8, 1) [Salsa20/8 32/64]... (60xOMP) DONE
Speed for cost 1 (N) of 16384, cost 2 (r) of 8, cost 3 (p) of 1
Raw: 421 c/s real, 7.6 c/s virtual
Benchmarking: LM [DES 128/128 AltiVec]... (60xOMP) DONE
Raw: 54312K c/s real, 3166K c/s virtual
Benchmarking: AFS, Kerberos AFS [DES 48/64 4K]... Illegal instruction
NOTE in configure, altivec is built, IF the compiler supports it. This would cause issues cross compiling.
The configure check for AltiVec based on compiler support will probably also cause issues on older or cut-down POWER architecture chips lacking AltiVec but running a compiler that supports AltiVec. This is not ideal, but OK for now. Your separate work on an option to disable SIMD could provide a workaround for this.
As slow as my PPC64 system is
BTW, if you happen to have an old Intel Mac with OS X 10.5, you could use its pre-installed Rosetta for pretty fast emulation of PPC32 with AltiVec - I used just that previously. It feels roughly like a real PPC G4, so something like 600 MHz, on a 2 GHz Core 2 Duo. Way faster than QEMU. Apple's Xcode of the time was readily capable of cross-compiling to PPC with AltiVec - JtR proper still includes make targets for that.
(gdb) b main
Breakpoint 1 at 0x102d03c8: file john.c, line 1891.
(gdb) r -test -form=afs
Starting program: /home/solar/bleeding-jumbo/src/../run/john -test -form=afs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 1, main (argc=3, argv=0x3ffffffff028) at john.c:1891
1891 sig_preinit(); /* Mitigate race conditions */
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.ppc64 glibc-2.17-196.el7.ppc64 gmp-6.0.0-15.el7.ppc64 keyutils-libs-1.5.8-3.el7.ppc64 krb5-libs-1.15.1-8.el7.ppc64 libcom_err-1.42.9-10.el7.ppc64 libgomp-4.8.5-16.el7.ppc64 libselinux-2.5-11.el7.ppc64 nss-softokn-freebl-3.28.3-8.el7_4.ppc64 openssl-libs-1.0.2k-8.el7.ppc64 pcre-8.32-17.el7.ppc64 zlib-1.2.7-17.el7.ppc64
(gdb) p/x $msr
$1 = 0x800000000002d032
(gdb) c
Continuing.
Benchmarking: AFS, Kerberos AFS [DES 48/64 4K]...
Program received signal SIGILL, Illegal instruction.
crypt_all (pcount=<optimized out>, salt=<optimized out>) at AFS_fmt.c:322
322 memcpy(binary.data, AFS_long_IV_binary, sizeof(binary.data));
(gdb) disass $pc-8,$pc+12
Dump of assembler code from 0x100291e0 to 0x100291f4:
0x00000000100291e0 <crypt_all+784>: mr r3,r19
0x00000000100291e4 <crypt_all+788>: mr r4,r12
=> 0x00000000100291e8 <crypt_all+792>: stq r10,672(r31)
0x00000000100291ec <crypt_all+796>: rldicl r26,r0,61,63
0x00000000100291f0 <crypt_all+800>: std r19,688(r31)
End of assembler dump.
(gdb) p/x $msr
$2 = 0x800000000004d032
Looks like it starts and stays in 32-bit mode. Weird. Probably I misunderstand MSR. John proper has similar MSR values, but works fine (built as linux-ppc64-altivec):
(gdb) b main
Breakpoint 1 at 0x10002114
(gdb) r -test -form=afs
Starting program: /home/solar/john-1.8.0.11-ppc/run/./john -test -form=afs
Breakpoint 1, 0x0000000010002114 in .main ()
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.ppc64 nss-softokn-freebl-3.28.3-8.el7_4.ppc64
(gdb) p/x $msr
$1 = 0x800000000002d032
(gdb) c
Continuing.
Benchmarking: AFS, Kerberos AFS [DES 48/64 4K]... ^C
Program received signal SIGINT, Interrupt.
0x0000000010006600 in .DES_std_crypt ()
(gdb) p/x $msr
$2 = 0x800000000000d032
(gdb) c
Continuing.
DONE
Short: 82600 c/s real, 329407 c/s virtual
Long: 1110K c/s real, 1110K c/s virtual
[Inferior 1 (process 45860) exited normally]
There's not a single stq
instruction (but plenty of std
) in my build of John proper, even though the object files and binaries are "64-bit":
[solar@gcc1-power7 src]$ for f in *.o; do objdump -d $f | fgrep -w stq; done
[solar@gcc1-power7 src]$ pwd
/home/solar/john-1.8.0.11-ppc/src
[solar@gcc1-power7 src]$ file AFS_fmt.o ../run/john
AFS_fmt.o: ELF 64-bit MSB relocatable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), not stripped
../run/john: ELF 64-bit MSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=9ff908b9c32b247da6c90e7696c6b27163ad6794, not stripped
OK, stq
is not such a basic instruction. In POWER architecture terminology, quadword is 128-bit, and one of the changes of ISA 2.07 (not supported on these CPUs) is:
"Allow lq/stq in Little-Endian Mode: Removes the restriction that the lq and stq instructions can only operate in Big-Endian mode."
However, at least per configure's detection we are in big-endian mode, so by that description I'd expect stq
to work on a CPU from just prior to that architecture revision (POWER7 instead of POWER8). But maybe not.
So we probably need to look for and remove a gcc flag that enabled use of stq
for this build.
Oh, also found: "In versions of the architecture prior to 2.07, this instruction was privileged." This explains why it's failing for us - we're not in the kernel.
There's still -mpower8-vector
getting into Makefile, and the code still fails to build when I remove that. Jim, I might have misunderstood you - I thought you meant you implemented AltiVec1-only builds this time, without AltiVec2, but what we have is the opposite. While we may close this issue because the default build completes (albeit presumably only with a recent enough compiler to support AltiVec2), the resulting binary is partially non-working on anything pre-AltiVec2 and no other binary can currently be produced... so I'm not exactly comfortable closing this.
Jim, how did you manage to make AltiVec work for me for bitslice DES in this build, given that you still enable -mpower8-vector
? Or is this just luck?
Regarding MSR, I read up on it more and that only confirmed my prior understanding. So I suspect it's misreporting by gdb. Many bits in MSR have to be altered on interrupt (pages 950-951 in the 2.07 manual specify exactly which and how), but there got to be a way for the kernel to figure out the MSR value of the interrupted process. Yet maybe that info is somehow not fully available to or not used by gdb.
-mpower8-vector
Ok, so that is only valid on ALTIVEC2? btw, I know very little about this CPU, I have simply been taking code/switches from Makefile.legacy and merging the ppc64.h / ppc64alt.h and then fixing compile/runtime problems I have seen in the emulator. Knowing the how/why/when of what is really happening in the iron, is over me (that is why I put out the call to get the SIMDPARAP* stuff done)
Ok, I add the -mpower8-vector blindly in the makefile. This was due to it being in the target for ppc in Makefile.legacy. So, what are our options here? Do we really need to wait until we get CPUID, so that I can check compile flags, and then make sure those compile flags can actually run on the iron being used?
Jim, how did you manage to make AltiVec work for me for bitslice DES in this build, given that you still enable -mpower8-vector? Or is this just luck?
I made no change at all. Whoever ported this code at one point may have made changes, but I have touched it not at all. I did change some of the #defines (to follow ppc[32][64]alt.h) but that is it. With those build flags, it works like a champ in QEMU. NOTE, I am hosting that env on a Haswell laptop (has AVX2), so it is possible that the emulator is allowing ALTIVEC2 instructions by converting over to AVX2
Does anyone know of a good CPUID stand along (with source) for power?
Minor OT nitpick:
configure: creating ./fmt_externs.h rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp] #warning ": target system requires aligned memory access, rar format disabled:" ^ configure: creating ./fmt_registers.h rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp] #warning ": target system requires aligned memory access, rar format disabled:" ^
$ time make -sj60 ar: creating aes.a rar_fmt_plug.c:442:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp] #warning ": target system requires aligned memory access, rar format disabled:" ^
Please replace these warnings so they only appear during build and not during ./configure (the Stribog one doesn't show during configure).
non-working on anything pre-AltiVec2 and no other binary can currently be produced... so I'm not exactly comfortable closing this.
I have the --enable-withoutsimd ready to go. I will finish up testing on that flag, but this one should allow a configure build on a ppc Altivec1 enabled machine (until we can figure it out). Also, the existing make -f Makefile.legacy linux-ppcp[64]
targets could also be used. Yep, Makefile.legacy is deprecated, but that dinosaur simply does not die and go away ;)
That being said, I will see if there is something i can do within the QEMU environment to disable ALTIVEC2 capability, so that I can get the crashes like you are getting, and figure out proper requirements for probing for Altivec1 and for Altivec2.
But at this time, I think we hold this issue open a bit more. Yep it works fine for me. But I really hate putting it out if it can not run on things using older HW, unless we totally disable those SIMD instructions.
Still OT: @magnumripper I saw that myself, and just have not gotten around to it.
Note, I had to do a little 'extra', to get the rawSHAng formats to NOT be placed into the fmt.h files when building non-simd. I have to have the arch.h files give themselves a lobotomy and be told to. so I had to add that 'do it' compile switch to the buildling lines for the fmt_.h We may want to visit how I did the new command line switch. I added a totally new one. We may want to instead drop it into CFLAGSEX or something (which would have been picked up by the gcc lines to make the fmt.h)
But as for rar, I saw the warnings also, just have not gotten to them yet. IIRC, the warnings also show up for the rawSHA*_ng.c files, so the #if logic to skip warning in fmt header builds has to be added there also.
I have the --enable-withoutsimd ready to go
Again OT but seriously, do not commit that until you changed it to --disable-simd. I swear I will pull your access rights to the repo 😉 😆
do not commit that until you changed it to --disable-simd.
Imagine that. Hmm, I got feedback I was needing, and nothing bad listed, 'cept my easter-egg ;)
Ok, I add the -mpower8-vector blindly in the makefile. This was due to it being in the target for ppc in Makefile.legacy. So, what are our options here?
-mpower8-vector
isn't in Makefile of JtR proper, nor in Makefile.legacy in jumbo. I suggest you try dropping it, see the resulting build errors from missing the AltiVec2 intrinsics (SIMD 64-bit adds, I think), and rework that code not to require those intrinsics when building without that option. The result will be a build working on AltiVec1, even though it would probably run a bit slower than we otherwise could on a machine capable of AltiVec2.
Trying to build today's bleeding-jumbo on GCC Compile Farm's "gcc110" configures as:
That was simple
./configure
with no options. Apparently, it didn't detect AltiVec? (The hardware supports AltiVec.) Anyway, the build fails with many errors like:and many more like these, in other source files too, indicating that our source files try to use SIMD anyway.
I've tried these, to no avail:
although the ways the build failed changed. In none of these cases would
./configure
explicitly say it'd use AltiVec. I think we should introduce end-user friendly configure options to enable/disable SIMD (and have those options substitute the needed platform-specific compiler flags automatically).BTW, there are also these warnings: