openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.1k stars 2.08k forks source link

CPUID tests needed for 32-bit #1855

Closed magnumripper closed 8 years ago

magnumripper commented 8 years ago

We need 32-bit asm (x86.S) versions of the new code in x86-64.S, testing for AVX2 etc.

ukasz commented 8 years ago

Do we also check /proc/cpuinfo? cpuid bit might be set, but without os support extensions might now work.

magnumripper commented 8 years ago

/proc/cpuinfo is not portable. Do you mean we should check for OS support as well as the cpuid bit? I guess that is what Solar already did for AVX - checking AVX, XSAVE and OSXSAVE. The latter is "XSAVE enabled by OS". But I have no idea what I should check for AVX2 apart from the very AVX2 bit.

magnumripper commented 8 years ago

Maybe this: https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family

magnumripper commented 8 years ago

I added a lot more checks in cbe7b9c, but still only for 64-bit.

magnumripper commented 8 years ago

I upgraded this to a bug, since 32-bit binary packaging won't work properly (that is, with fallback) without this. And we definitely have to fix it prior to Jumbo-2.

ukasz commented 8 years ago

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf according to Intel's SDM section 13.3: In addition, software can execute AVX instructions only if CR4.OSXSAVE = XCR0[2] = 1. For AVX2 I think we must assure the above + AVX2 flag in "cpuid". I can give it a shoot and implement it this week if you don't mind. Once we have it we can add AVX512 similarly.

magnumripper commented 8 years ago

For 64-bit we already check that and more, but please have a look at it and verify.

For AVX we check avx + xsave + osxsave. For AVX2 we also check avx2 + bmi1 + bmi2 + lzcnt. For AVX512 we also check avx512f and avx512bw flags.

jfoug commented 8 years ago

Before we close this, I want to also make sure we FIX the AVX2 CPUID failure on my virtualbox system. I run virtualbox on other systems, (AVX and XOP) so I will make sure whether detection is working at all, or if this is just AVX2 failures.

jfoug commented 8 years ago

AVX2 on Fedora22-32 on that machine (using virtualbox again), works just fine with AVX2 (only 64 bit Fedora-22 is failing).

On my fedora-64 bit VM (on XOP machine), it fails also :(

[ghost@localhost src]$ ../run/john -test=0
Sorry, XOP is required for this build
magnumripper commented 8 years ago

I know it works on real XOP and AVX2 systems (and btw the XOP tests are Solar's and has been there for years). I wrote the AVX2 tests according to Intel's guidelines. Perhaps something is fishy with that VM? But regardless, we should find out exactly what fails.

Since the XOP tests fail, perhaps try a "john proper" XOP build and see if that works? If it does, there must be a logical bug (maybe even at CPP level) that make my tests end up different than intended.

magnumripper commented 8 years ago

Diff against master

-#if defined(CPU_REQ_AVX) || defined(CPU_REQ_XOP)
+#if CPU_REQ
 /*
  * CPU detection.
  */

+#define C7_AVX512BW                    $0x40010128
+#define C7_AVX512F                     $0x00010128
+#define C7_AVX2                                $0x00000128
+#define CX_AVX2                                $0x00000020
+#define CX_XOP                         $0x00000800
 #define CF_XSAVE_OSXSAVE_AVX           $0x1C000000
-#define CF_XOP                         $0x00000800
+#define CF_SSSE3                       $0x00000200
+#define CF_SSE4_1                      $0x00080200

 .text

@@ -1363,8 +1658,40 @@ DES_bs_crypt_LM_loop:
 .globl CPU_detect
 CPU_detect:
        pushq %rbx
+#if CPU_REQ_AVX2 || CPU_REQ_AVX512F || CPU_REQ_AVX512BW
+       xorl %eax,%eax
+       cpuid
+       movl $7,%edx
+       cmpl %edx,%eax
+       jl CPU_detect_fail
+       xchgl %edx,%eax
+       xorl %ecx,%ecx
+       cpuid
+#if CPU_REQ_AVX512BW
+       andl C7_AVX512BW,%ebx
+       cmpl C7_AVX512BW,%ebx
+       jne CPU_detect_fail
+#elif CPU_REQ_AVX512F
+       andl C7_AVX512F,%ebx
+       cmpl C7_AVX512F,%ebx
+       jne CPU_detect_fail
+#else
+       andl C7_AVX2,%ebx
+       cmpl C7_AVX2,%ebx
+       jne CPU_detect_fail
+#endif
+#endif
        movl $1,%eax
        cpuid
+#if CPU_REQ_SSSE3
+       andl CF_SSSE3,%ecx
+       cmpl CF_SSSE3,%ecx
+       jne CPU_detect_fail
+#elif CPU_REQ_SSE4_1
+       andl CF_SSE4_1,%ecx
+       cmpl CF_SSE4_1,%ecx
+       jne CPU_detect_fail
+#else
        andl CF_XSAVE_OSXSAVE_AVX,%ecx
        cmpl CF_XSAVE_OSXSAVE_AVX,%ecx
        jne CPU_detect_fail
@@ -1373,7 +1700,7 @@ CPU_detect:
        andb $0x6,%al
        cmpb $0x6,%al
        jne CPU_detect_fail
-#ifdef CPU_REQ_XOP
+#if CPU_REQ_AVX2 || CPU_REQ_XOP
        movl $0x80000000,%eax
        cpuid
        movl $0x80000001,%edx
@@ -1381,9 +1708,14 @@ CPU_detect:
        jl CPU_detect_fail
        xchgl %edx,%eax
        cpuid
-       testl CF_XOP,%ecx
+#if CPU_REQ_AVX2
+       testl CX_AVX2,%ecx
+#elif CPU_REQ_XOP
+       testl CX_XOP,%ecx
+#endif
        jz CPU_detect_fail
 #endif
+#endif
        movl $1,%eax
        popq %rbx
        ret
jfoug commented 8 years ago

All I know is both of these VM's worked fine, before (with proper max CPU instructions). But that may have been before the runtime checks.

ukasz commented 8 years ago

With AVX512 there is a slightly different story if we check AVX512F AND AVX512BW then such check will fail on Knights Landing, since BW is not supported there. I think we must treat each AVX512 flag separately.

ukasz commented 8 years ago

jfoug could you use cpuid tool to dump cpuid state registers on both machines, just to get some inside of failing configuration?

jfoug commented 8 years ago

On the AVX2 VM, the BMI and BMI2 are both false.

These are false: BMI BMI2 FMA

These are true AVX2 MOVBE OS - XSAVE/XSTOR XMM/YMM support in XCR0 LZCNT

jfoug commented 8 years ago

linux-x86-64-xop fails also for JP

jfoug commented 8 years ago

going to check JP on the real machine (using cygwin-64). That works fine.

jfoug commented 8 years ago

Ok, I found this page: https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family

Compiled on the VM, and I get 'This CPU does not support all ISA extensions introduced in Haswell' But on the same machine, using cygwin, I get 'This CPU supports ISA exgtensions introduced in Haswell' So Virtualbox is not virtualizing all of the extensions.

BUT as I said before, AVX2 runs fine as we have coded under JtR, and until these new runtime checks, it has been used quite extensively.

jfoug commented 8 years ago

I believe there is an update to virtualbox. I will see if installing the update provides these CPU 'features'. It was an update about a year ago that initially enabled the AVX2.

By reading changelogs, no they have not fixed this yet. It looks like in 5.0.2 (I am running 5.0.0), they removed AVX2 and AVX passthru, due to crashes hosting windows under linux, and none of the maintenance releases since then (they are on 5.0.10) have fixed this.

magnumripper commented 8 years ago

With AVX512 there is a slightly different story if we check AVX512F AND AVX512BW then such check will fail on Knights Landing, since BW is not supported there. I think we must treat each AVX512 flag separately.

The checks aren't enabled for a MIC build. Also, we do treat them separately.

magnumripper commented 8 years ago

Virtualbox is not virtualizing all of the extensions.

BUT as I said before, AVX2 runs fine as we have coded under JtR, and until these new runtime checks, it has been used quite extensively.

I know there are ways to establish "we are running virtualized". Perhaps that is also a cpuid flag? That would be easy them: we could simply check for ((BMI & BMI2) || virtual).

Edit: A hypervisor (that doesn't try to hide) will set 'hypervisor' (bit 31 in ECX) with fn 1.

magnumripper commented 8 years ago

The XOP test doesn't look for BMI or BMI2 so there must be more that Virtualbox misses. The XOP tests look for xsave, osxsave, avx and xop plus it does something with xgetbv comparing %al to $6.

jfoug commented 8 years ago

I am not sure that ignoring BMI/2 is right, if running virtual. That just happens to be what is passed through and what is not passed through on the version of VB I am running (older 5.0.0). The newer version has not 'fixed' the problem, it has simply disabled AVX2 totally.

NOTE, and AVX build runs just fine, btw.

The big 'deal', is that we have an AVX2 build which runs just fine, but we are checking for 100% compliance with all Haswell CPU features, even though we do not use all of them.

It is an ugly situation, where there are numerous independent CPU extensions. It is certainly a shortcoming of virtualbox, as implemented, where it does not have FULL Haswell support.

Possibly, we should add a 'DoNotCPUCheck' flag to the john.configure settings (so that john-local.conf can override it). This would allow a build to be made which 'may' run, but which the full CPU detection logic says is not 100% compatible. If the CPU/VM is not compliant, it will simply core with an invalid instruction somewhere within the run. But at least it tries.

jfoug commented 8 years ago

I will simply put code into a 'DoNotCPUCheck' flag. @magnumripper, what is the exact flag ID you want used?

ukasz commented 8 years ago

Just note that in some cases if os does not support given extension you will not be able to use it even if cpu supports it since it might end up in invalid instruction exception, but if someone is willing to try anyway then yeah, why not.

jfoug commented 8 years ago

b9b466c is code to 'ignore' the CPUID runtime checks. WIth that, and with a john-local.conf (with [Local:Options] section), and my VM's are back to working fine with XOP/AVX2 intrinsics which we have, and the speed is back again.

magnumripper commented 8 years ago

I have a patch already, hold on

magnumripper commented 8 years ago

we are checking for 100% compliance with all Haswell CPU features, even though we do not use all of them.

AFAIK we check what we HAVE to check for things compiled with -mavx2. Even if a --test=0 run does not crash for you right now, some future change in some code may cause the compiler to emit any of the instruction we test for (including things from BMI1, BMI2 or ABM).

magnumripper commented 8 years ago

I have a patch already, hold on

OK too late so I'm ditching my patch. It was a compile-time option to disable only the BMI tests. But your patch is probably better.

Anyway I made some other fixes in 5771322, not really related to this.

jfoug commented 8 years ago

My patch will not 'change' any of the tests. Testing for full compliance is 'probably' the best way to go. Like you mention, the -mavx2 will allow ANY of those extensions to be emitted, and they MAY be at some point. Right now they are not, since we have not put them into our intrinsics. But the exe image itself 'could' get them at some time.

So the simply patch I have simply goes back to older behavior taking no precaution, and simply allowing the ill-instruction core to happen IF a certain extension is not there.

jfoug commented 8 years ago

The only question I had was in moving the fallback check to lower in the john_init() function. Please look over that change, to make sure I have not f-'cked something up royally in doing that.

magnumripper commented 8 years ago

Yeah unfortunately it needs to be reverted.

jfoug commented 8 years ago

I had a feeling that might be some catch-22.

Ok, do you have any idea on how we can read the config file for this, at the early time in the john_init ?

magnumripper commented 8 years ago

You need to make it a compile-time option. I'll send you my patch, let's see if it works at all.

jfoug commented 8 years ago

As mentioned in another patch (wrong place to make that post), a get_env() call may be a viable work around. It does not use any jtr code that might core from ill-inst exceptions. And it would allow ONLY systems that are known to false fail to be set to ignore the checks.

I really think adding 'special' logic to ignore these checks, but only on hypervisor is NOT the right way. This is just an issue on virtualbox. It may not be a problem at all on hydra, vmware, virtual-pc etc, etc.

I really think a system specific 'ignore' method would be much better.

magnumripper commented 8 years ago

Ok, with this patch, it works on my XOP VM. I made sure the other code was not there, actually REMOVING the cfg_get_bool() call and if.

So this works with the change YOU made (i.e. if hypervisor && !BMI then fail

This was on XOP, so likely the same extension addition is causing the XOP to also fail, since this patch has fixed the XOP VM. I do not have remote access to the machine with AVX2 VM, so can not test at this time, but I would bet it also works correctly now.

Hmm but XOP builds never tested the BMI flags (it's only tested on AVX2 and above), so XOP was probably fixed by 5771322d somehow. We need to test my patch on Virtualbox with AVX2.

magnumripper commented 8 years ago

As mentioned in another patch (wrong place to make that post), a get_env() call may be a viable work around.

Definitely. I'll fix that right away!

jfoug commented 8 years ago

Hmm but XOP builds never tested the BMI flags (it's only tested on AVX2 and above), so XOP was probably fixed by 5771322 somehow.

I agree. I will test AVX2 later today.

magnumripper commented 8 years ago

Done.

$ ../run/john -test=0
Sorry, AVX2 is required for this build

$ CPUID_DISABLE= ../run/john -test=0
Will run 8 OpenMP threads
Testing: descrypt, traditional crypt(3) [DES 256/256 AVX2-16]... (8xOMP)
Illegal instruction: 4

Note that it doesn't matter what you define it to, including empty.

jfoug commented 8 years ago

That is fine. All I planned was if an env was there then skip checks. Thus, it will ONLY be set on a system where someone knows (or hopes) the build works.

magnumripper commented 8 years ago

Some more tweaks in 5275f2a, fixes a bug with SSE4.1 detection, and possibly more - the code was reworked to be easier to follow.

magnumripper commented 8 years ago

I no longer have any means to build 32-bit on my Macbook. Somebody please try out f9338cc with 32-bit with AVX2 and see that it works, and also forcing a 32-bit AVX2 build on a non-AVX2 machine and verify that it bails out cleanly with "Sorry, AVX2 is required". If all is OK, we can close this.

jfoug commented 8 years ago

Tested CPUID_DISABLE on my AVX2 VirtualBox VM. Works correctly. Fails with AVX2 required without the env var, but runs like a champ with it.

jfoug commented 8 years ago

Ok, 32 bit runs fine on native (cygwin). I am not sure moving that exe to other system. I had dll issues (GMP-omp). But I also tested 32 bit VM, and it fails, but adding the env var makes it work just fine, SO I would assume, if the 32 bit exe ran properly, that it would also fail with AVX2 required on a non AVX2 box.

magnumripper commented 8 years ago

Minor fixes for 32-bit in 386e1b7. I tested it and stepped various builds (AVX2, SSE4.1, SSSE3, SSE2) in the debugger, all seems to be OK.