Closed magnumripper closed 8 years ago
Do we also check /proc/cpuinfo? cpuid bit might be set, but without os support extensions might now work.
/proc/cpuinfo is not portable. Do you mean we should check for OS support as well as the cpuid bit? I guess that is what Solar already did for AVX - checking AVX, XSAVE and OSXSAVE. The latter is "XSAVE enabled by OS". But I have no idea what I should check for AVX2 apart from the very AVX2 bit.
I added a lot more checks in cbe7b9c, but still only for 64-bit.
I upgraded this to a bug, since 32-bit binary packaging won't work properly (that is, with fallback) without this. And we definitely have to fix it prior to Jumbo-2.
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf according to Intel's SDM section 13.3: In addition, software can execute AVX instructions only if CR4.OSXSAVE = XCR0[2] = 1. For AVX2 I think we must assure the above + AVX2 flag in "cpuid". I can give it a shoot and implement it this week if you don't mind. Once we have it we can add AVX512 similarly.
For 64-bit we already check that and more, but please have a look at it and verify.
For AVX we check avx + xsave + osxsave. For AVX2 we also check avx2 + bmi1 + bmi2 + lzcnt. For AVX512 we also check avx512f and avx512bw flags.
Before we close this, I want to also make sure we FIX the AVX2 CPUID failure on my virtualbox system. I run virtualbox on other systems, (AVX and XOP) so I will make sure whether detection is working at all, or if this is just AVX2 failures.
AVX2 on Fedora22-32 on that machine (using virtualbox again), works just fine with AVX2 (only 64 bit Fedora-22 is failing).
On my fedora-64 bit VM (on XOP machine), it fails also :(
[ghost@localhost src]$ ../run/john -test=0
Sorry, XOP is required for this build
I know it works on real XOP and AVX2 systems (and btw the XOP tests are Solar's and has been there for years). I wrote the AVX2 tests according to Intel's guidelines. Perhaps something is fishy with that VM? But regardless, we should find out exactly what fails.
Since the XOP tests fail, perhaps try a "john proper" XOP build and see if that works? If it does, there must be a logical bug (maybe even at CPP level) that make my tests end up different than intended.
Diff against master
-#if defined(CPU_REQ_AVX) || defined(CPU_REQ_XOP)
+#if CPU_REQ
/*
* CPU detection.
*/
+#define C7_AVX512BW $0x40010128
+#define C7_AVX512F $0x00010128
+#define C7_AVX2 $0x00000128
+#define CX_AVX2 $0x00000020
+#define CX_XOP $0x00000800
#define CF_XSAVE_OSXSAVE_AVX $0x1C000000
-#define CF_XOP $0x00000800
+#define CF_SSSE3 $0x00000200
+#define CF_SSE4_1 $0x00080200
.text
@@ -1363,8 +1658,40 @@ DES_bs_crypt_LM_loop:
.globl CPU_detect
CPU_detect:
pushq %rbx
+#if CPU_REQ_AVX2 || CPU_REQ_AVX512F || CPU_REQ_AVX512BW
+ xorl %eax,%eax
+ cpuid
+ movl $7,%edx
+ cmpl %edx,%eax
+ jl CPU_detect_fail
+ xchgl %edx,%eax
+ xorl %ecx,%ecx
+ cpuid
+#if CPU_REQ_AVX512BW
+ andl C7_AVX512BW,%ebx
+ cmpl C7_AVX512BW,%ebx
+ jne CPU_detect_fail
+#elif CPU_REQ_AVX512F
+ andl C7_AVX512F,%ebx
+ cmpl C7_AVX512F,%ebx
+ jne CPU_detect_fail
+#else
+ andl C7_AVX2,%ebx
+ cmpl C7_AVX2,%ebx
+ jne CPU_detect_fail
+#endif
+#endif
movl $1,%eax
cpuid
+#if CPU_REQ_SSSE3
+ andl CF_SSSE3,%ecx
+ cmpl CF_SSSE3,%ecx
+ jne CPU_detect_fail
+#elif CPU_REQ_SSE4_1
+ andl CF_SSE4_1,%ecx
+ cmpl CF_SSE4_1,%ecx
+ jne CPU_detect_fail
+#else
andl CF_XSAVE_OSXSAVE_AVX,%ecx
cmpl CF_XSAVE_OSXSAVE_AVX,%ecx
jne CPU_detect_fail
@@ -1373,7 +1700,7 @@ CPU_detect:
andb $0x6,%al
cmpb $0x6,%al
jne CPU_detect_fail
-#ifdef CPU_REQ_XOP
+#if CPU_REQ_AVX2 || CPU_REQ_XOP
movl $0x80000000,%eax
cpuid
movl $0x80000001,%edx
@@ -1381,9 +1708,14 @@ CPU_detect:
jl CPU_detect_fail
xchgl %edx,%eax
cpuid
- testl CF_XOP,%ecx
+#if CPU_REQ_AVX2
+ testl CX_AVX2,%ecx
+#elif CPU_REQ_XOP
+ testl CX_XOP,%ecx
+#endif
jz CPU_detect_fail
#endif
+#endif
movl $1,%eax
popq %rbx
ret
All I know is both of these VM's worked fine, before (with proper max CPU instructions). But that may have been before the runtime checks.
With AVX512 there is a slightly different story if we check AVX512F AND AVX512BW then such check will fail on Knights Landing, since BW is not supported there. I think we must treat each AVX512 flag separately.
jfoug could you use cpuid tool to dump cpuid state registers on both machines, just to get some inside of failing configuration?
On the AVX2 VM, the BMI and BMI2 are both false.
These are false: BMI BMI2 FMA
These are true AVX2 MOVBE OS - XSAVE/XSTOR XMM/YMM support in XCR0 LZCNT
linux-x86-64-xop fails also for JP
going to check JP on the real machine (using cygwin-64). That works fine.
Ok, I found this page: https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
Compiled on the VM, and I get 'This CPU does not support all ISA extensions introduced in Haswell' But on the same machine, using cygwin, I get 'This CPU supports ISA exgtensions introduced in Haswell' So Virtualbox is not virtualizing all of the extensions.
BUT as I said before, AVX2 runs fine as we have coded under JtR, and until these new runtime checks, it has been used quite extensively.
I believe there is an update to virtualbox. I will see if installing the update provides these CPU 'features'. It was an update about a year ago that initially enabled the AVX2.
By reading changelogs, no they have not fixed this yet. It looks like in 5.0.2 (I am running 5.0.0), they removed AVX2 and AVX passthru, due to crashes hosting windows under linux, and none of the maintenance releases since then (they are on 5.0.10) have fixed this.
With AVX512 there is a slightly different story if we check AVX512F AND AVX512BW then such check will fail on Knights Landing, since BW is not supported there. I think we must treat each AVX512 flag separately.
The checks aren't enabled for a MIC build. Also, we do treat them separately.
Virtualbox is not virtualizing all of the extensions.
BUT as I said before, AVX2 runs fine as we have coded under JtR, and until these new runtime checks, it has been used quite extensively.
I know there are ways to establish "we are running virtualized". Perhaps that is also a cpuid flag? That would be easy them: we could simply check for ((BMI & BMI2) || virtual)
.
Edit: A hypervisor (that doesn't try to hide) will set 'hypervisor' (bit 31 in ECX) with fn 1.
The XOP test doesn't look for BMI or BMI2 so there must be more that Virtualbox misses. The XOP tests look for xsave, osxsave, avx and xop plus it does something with xgetbv comparing %al to $6.
I am not sure that ignoring BMI/2 is right, if running virtual. That just happens to be what is passed through and what is not passed through on the version of VB I am running (older 5.0.0). The newer version has not 'fixed' the problem, it has simply disabled AVX2 totally.
NOTE, and AVX build runs just fine, btw.
The big 'deal', is that we have an AVX2 build which runs just fine, but we are checking for 100% compliance with all Haswell CPU features, even though we do not use all of them.
It is an ugly situation, where there are numerous independent CPU extensions. It is certainly a shortcoming of virtualbox, as implemented, where it does not have FULL Haswell support.
Possibly, we should add a 'DoNotCPUCheck' flag to the john.configure settings (so that john-local.conf can override it). This would allow a build to be made which 'may' run, but which the full CPU detection logic says is not 100% compatible. If the CPU/VM is not compliant, it will simply core with an invalid instruction somewhere within the run. But at least it tries.
I will simply put code into a 'DoNotCPUCheck' flag. @magnumripper, what is the exact flag ID you want used?
Just note that in some cases if os does not support given extension you will not be able to use it even if cpu supports it since it might end up in invalid instruction exception, but if someone is willing to try anyway then yeah, why not.
b9b466c is code to 'ignore' the CPUID runtime checks. WIth that, and with a john-local.conf (with [Local:Options] section), and my VM's are back to working fine with XOP/AVX2 intrinsics which we have, and the speed is back again.
I have a patch already, hold on
we are checking for 100% compliance with all Haswell CPU features, even though we do not use all of them.
AFAIK we check what we HAVE to check for things compiled with -mavx2
. Even if a --test=0
run does not crash for you right now, some future change in some code may cause the compiler to emit any of the instruction we test for (including things from BMI1, BMI2 or ABM).
I have a patch already, hold on
OK too late so I'm ditching my patch. It was a compile-time option to disable only the BMI tests. But your patch is probably better.
Anyway I made some other fixes in 5771322, not really related to this.
My patch will not 'change' any of the tests. Testing for full compliance is 'probably' the best way to go. Like you mention, the -mavx2
will allow ANY of those extensions to be emitted, and they MAY be at some point. Right now they are not, since we have not put them into our intrinsics. But the exe image itself 'could' get them at some time.
So the simply patch I have simply goes back to older behavior taking no precaution, and simply allowing the ill-instruction core to happen IF a certain extension is not there.
The only question I had was in moving the fallback check to lower in the john_init() function. Please look over that change, to make sure I have not f-'cked something up royally in doing that.
Yeah unfortunately it needs to be reverted.
I had a feeling that might be some catch-22.
Ok, do you have any idea on how we can read the config file for this, at the early time in the john_init ?
You need to make it a compile-time option. I'll send you my patch, let's see if it works at all.
As mentioned in another patch (wrong place to make that post), a get_env() call may be a viable work around. It does not use any jtr code that might core from ill-inst exceptions. And it would allow ONLY systems that are known to false fail to be set to ignore the checks.
I really think adding 'special' logic to ignore these checks, but only on hypervisor is NOT the right way. This is just an issue on virtualbox. It may not be a problem at all on hydra, vmware, virtual-pc etc, etc.
I really think a system specific 'ignore' method would be much better.
Ok, with this patch, it works on my XOP VM. I made sure the other code was not there, actually REMOVING the cfg_get_bool() call and if.
So this works with the change YOU made (i.e. if hypervisor && !BMI then fail
This was on XOP, so likely the same extension addition is causing the XOP to also fail, since this patch has fixed the XOP VM. I do not have remote access to the machine with AVX2 VM, so can not test at this time, but I would bet it also works correctly now.
Hmm but XOP builds never tested the BMI flags (it's only tested on AVX2 and above), so XOP was probably fixed by 5771322d somehow. We need to test my patch on Virtualbox with AVX2.
As mentioned in another patch (wrong place to make that post), a get_env() call may be a viable work around.
Definitely. I'll fix that right away!
Hmm but XOP builds never tested the BMI flags (it's only tested on AVX2 and above), so XOP was probably fixed by 5771322 somehow.
I agree. I will test AVX2 later today.
Done.
$ ../run/john -test=0
Sorry, AVX2 is required for this build
$ CPUID_DISABLE= ../run/john -test=0
Will run 8 OpenMP threads
Testing: descrypt, traditional crypt(3) [DES 256/256 AVX2-16]... (8xOMP)
Illegal instruction: 4
Note that it doesn't matter what you define it to, including empty.
That is fine. All I planned was if an env was there then skip checks. Thus, it will ONLY be set on a system where someone knows (or hopes) the build works.
Some more tweaks in 5275f2a, fixes a bug with SSE4.1 detection, and possibly more - the code was reworked to be easier to follow.
I no longer have any means to build 32-bit on my Macbook. Somebody please try out f9338cc with 32-bit with AVX2 and see that it works, and also forcing a 32-bit AVX2 build on a non-AVX2 machine and verify that it bails out cleanly with "Sorry, AVX2 is required". If all is OK, we can close this.
Tested CPUID_DISABLE on my AVX2 VirtualBox VM. Works correctly. Fails with AVX2 required without the env var, but runs like a champ with it.
Ok, 32 bit runs fine on native (cygwin). I am not sure moving that exe to other system. I had dll issues (GMP-omp). But I also tested 32 bit VM, and it fails, but adding the env var makes it work just fine, SO I would assume, if the 32 bit exe ran properly, that it would also fail with AVX2 required on a non AVX2 box.
Minor fixes for 32-bit in 386e1b7. I tested it and stepped various builds (AVX2, SSE4.1, SSSE3, SSE2) in the debugger, all seems to be OK.
We need 32-bit asm (x86.S) versions of the new code in x86-64.S, testing for AVX2 etc.