Closed solardiz closed 6 years ago
hmm, If i did not get -mpower8-vector from the makefile/makefile.legacy, i wonder where the F i got it from ???
I will do some additional work with and without it, and hopefully we can get some CPUID stuff addressed also.
I have found this code, BUT it only tests for existance of ALTIVEC. It does not list how to test for extension 2
http://freevec.org/function/altivec_runtime_detection_linux http://www.freehackers.org/thomas/2011/05/13/how-to-detect-altivec-availability-on-linuxppc-at-runtime/
If we use code like this, how should we attribute it?
@solardiz Please test this program (I wrote it from the code above) and run this on the gcc #10 and post the results.
// updated code a bit
#include <stdio.h>
#include <fcntl.h>
#include <linux/auxvec.h>
#include <asm/cputable.h>
// from http://freevec.org/function/altivec_runtime_detection_linux
long get_caps_ppc(int caps2)
{
int result = 0;
unsigned long buf[64];
size_t count;
int fd, i;
fd = open("/proc/self/auxv", O_RDONLY);
if (fd < 0) { return 0; }
// loop on reading
do {
count = read(fd, buf, sizeof(buf));
if (count < 0)
break;
for (i=0; i < (count / sizeof(unsigned long)); i += 2) {
if (!caps2 && buf[i] == AT_HWCAP) {
result = buf[i+1];
goto out_close;
} else if (caps2 && buf[i] == AT_HWCAP2) {
result = buf[i+1];
goto out_close;
} else if (buf[i] == AT_NULL)
goto out_close;
}
} while (count == sizeof(buf));
out_close:
close(fd);
return result;
}
print_cap(long cap, long feature_bit, const char *feature) {
char buf[128], *cp;
sprintf (buf, "%-36s", feature);
cp = strchr(buf, ' ');
while (*cp == ' ')
*cp++ = '.';
if (cap & feature_bit)
printf ("%s Present\n", buf);
else
printf ("%s Absent\n", buf);
}
int main() {
long caps = get_caps_ppc(0);
long caps2 = get_caps_ppc(2);
printf ("\nAT_HWCAP features\n");
print_cap (caps, PPC_FEATURE_32, "PPC_FEATURE_32");
print_cap (caps, PPC_FEATURE_64, "PPC_FEATURE_64");
print_cap (caps, PPC_FEATURE_601_INSTR, "PPC_FEATURE_601_INSTR");
print_cap (caps, PPC_FEATURE_HAS_ALTIVEC, "PPC_FEATURE_HAS_ALTIVEC");
print_cap (caps, PPC_FEATURE_HAS_FPU, "PPC_FEATURE_HAS_FPU");
print_cap (caps, PPC_FEATURE_HAS_MMU, "PPC_FEATURE_HAS_MMU");
print_cap (caps, PPC_FEATURE_HAS_4xxMAC, "PPC_FEATURE_HAS_4xxMAC");
print_cap (caps, PPC_FEATURE_UNIFIED_CACHE, "PPC_FEATURE_UNIFIED_CACHE");
print_cap (caps, PPC_FEATURE_HAS_SPE, "PPC_FEATURE_HAS_SPE");
print_cap (caps, PPC_FEATURE_HAS_EFP_SINGLE, "PPC_FEATURE_HAS_EFP_SINGLE");
print_cap (caps, PPC_FEATURE_HAS_EFP_DOUBLE, "PPC_FEATURE_HAS_EFP_DOUBLE");
print_cap (caps, PPC_FEATURE_NO_TB, "PPC_FEATURE_NO_TB");
print_cap (caps, PPC_FEATURE_POWER4, "PPC_FEATURE_POWER4");
print_cap (caps, PPC_FEATURE_POWER5, "PPC_FEATURE_POWER5");
print_cap (caps, PPC_FEATURE_POWER5_PLUS, "PPC_FEATURE_POWER5_PLUS");
print_cap (caps, PPC_FEATURE_CELL, "PPC_FEATURE_CELL");
print_cap (caps, PPC_FEATURE_BOOKE, "PPC_FEATURE_BOOKE");
print_cap (caps, PPC_FEATURE_SMT, "PPC_FEATURE_SMT");
print_cap (caps, PPC_FEATURE_ICACHE_SNOOP, "PPC_FEATURE_ICACHE_SNOOP");
print_cap (caps, PPC_FEATURE_ARCH_2_05, "PPC_FEATURE_ARCH_2_05");
print_cap (caps, PPC_FEATURE_PA6T, "PPC_FEATURE_PA6T");
print_cap (caps, PPC_FEATURE_HAS_DFP, "PPC_FEATURE_HAS_DFP");
print_cap (caps, PPC_FEATURE_POWER6_EXT, "PPC_FEATURE_POWER6_EXT");
print_cap (caps, PPC_FEATURE_ARCH_2_06, "PPC_FEATURE_ARCH_2_06");
print_cap (caps, PPC_FEATURE_HAS_VSX, "PPC_FEATURE_HAS_VSX");
print_cap (caps, PPC_FEATURE_PSERIES_PERFMON_COMPAT, "PPC_FEATURE_PSERIES_PERFMON_COMPAT");
print_cap (caps, PPC_FEATURE_TRUE_LE, "PPC_FEATURE_TRUE_LE");
print_cap (caps, PPC_FEATURE_PPC_LE, "PPC_FEATURE_PPC_LE");
printf ("\nAT_HWCAP2 features\n");
print_cap (caps2,PPC_FEATURE2_ARCH_2_07, "PPC_FEATURE2_ARCH_2_07");
print_cap (caps2,PPC_FEATURE2_HTM, "PPC_FEATURE2_HTM");
print_cap (caps2,PPC_FEATURE2_DSCR, "PPC_FEATURE2_DSCR");
print_cap (caps2,PPC_FEATURE2_EBB, "PPC_FEATURE2_EBB");
print_cap (caps2,PPC_FEATURE2_ISEL, "PPC_FEATURE2_ISEL");
print_cap (caps2,PPC_FEATURE2_TAR, "PPC_FEATURE2_TAR");
print_cap (caps2,PPC_FEATURE2_VEC_CRYPTO, "PPC_FEATURE2_VEC_CRYPTO");
printf("\n");
return 0;
}
Running on my QEMU, I get this:
AT_HWCAP features
PPC_FEATURE_32...................... Present
PPC_FEATURE_64...................... Present
PPC_FEATURE_601_INSTR............... Absent
PPC_FEATURE_HAS_ALTIVEC............. Present
PPC_FEATURE_HAS_FPU................. Present
PPC_FEATURE_HAS_MMU................. Present
PPC_FEATURE_HAS_4xxMAC.............. Absent
PPC_FEATURE_UNIFIED_CACHE........... Absent
PPC_FEATURE_HAS_SPE................. Absent
PPC_FEATURE_HAS_EFP_SINGLE.......... Absent
PPC_FEATURE_HAS_EFP_DOUBLE.......... Absent
PPC_FEATURE_NO_TB................... Absent
PPC_FEATURE_POWER4.................. Absent
PPC_FEATURE_POWER5.................. Absent
PPC_FEATURE_POWER5_PLUS............. Absent
PPC_FEATURE_CELL.................... Absent
PPC_FEATURE_BOOKE................... Absent
PPC_FEATURE_SMT..................... Present
PPC_FEATURE_ICACHE_SNOOP............ Present
PPC_FEATURE_ARCH_2_05............... Absent
PPC_FEATURE_PA6T.................... Absent
PPC_FEATURE_HAS_DFP................. Present
PPC_FEATURE_POWER6_EXT.............. Absent
PPC_FEATURE_ARCH_2_06............... Present
PPC_FEATURE_HAS_VSX................. Present
PPC_FEATURE_PSERIES_PERFMON_COMPAT.. Present
PPC_FEATURE_TRUE_LE................. Present
PPC_FEATURE_PPC_LE.................. Absent
AT_HWCAP2 features
PPC_FEATURE2_ARCH_2_07.............. Present
PPC_FEATURE2_HTM.................... Present
PPC_FEATURE2_DSCR................... Present
PPC_FEATURE2_EBB.................... Present
PPC_FEATURE2_ISEL................... Present
PPC_FEATURE2_TAR.................... Present
PPC_FEATURE2_VEC_CRYPTO............. Present
This is probably as close to a CPUID we will be able to get for ppc
I was not specifically looking for it, but per my reading of the 2.07 arch manual earlier today it appears that some bits in MSR reflect whether the current thread can use VSX or not, etc. So maybe look in there and try that? It'd be one inline asm instruction to extract MSR into a variable.
[solar@gcc1-power7 ~]$ ./hwcap
AT_HWCAP features
PPC_FEATURE_32...................... Present
PPC_FEATURE_64...................... Present
PPC_FEATURE_601_INSTR............... Absent
PPC_FEATURE_HAS_ALTIVEC............. Present
PPC_FEATURE_HAS_FPU................. Present
PPC_FEATURE_HAS_MMU................. Present
PPC_FEATURE_HAS_4xxMAC.............. Absent
PPC_FEATURE_UNIFIED_CACHE........... Absent
PPC_FEATURE_HAS_SPE................. Absent
PPC_FEATURE_HAS_EFP_SINGLE.......... Absent
PPC_FEATURE_HAS_EFP_DOUBLE.......... Absent
PPC_FEATURE_NO_TB................... Absent
PPC_FEATURE_POWER4.................. Absent
PPC_FEATURE_POWER5.................. Absent
PPC_FEATURE_POWER5_PLUS............. Absent
PPC_FEATURE_CELL.................... Absent
PPC_FEATURE_BOOKE................... Absent
PPC_FEATURE_SMT..................... Present
PPC_FEATURE_ICACHE_SNOOP............ Present
PPC_FEATURE_ARCH_2_05............... Absent
PPC_FEATURE_PA6T.................... Absent
PPC_FEATURE_HAS_DFP................. Present
PPC_FEATURE_POWER6_EXT.............. Absent
PPC_FEATURE_ARCH_2_06............... Present
PPC_FEATURE_HAS_VSX................. Present
PPC_FEATURE_PSERIES_PERFMON_COMPAT.. Present
PPC_FEATURE_TRUE_LE................. Present
PPC_FEATURE_PPC_LE.................. Present
AT_HWCAP2 features
PPC_FEATURE2_ARCH_2_07.............. Absent
PPC_FEATURE2_HTM.................... Absent
PPC_FEATURE2_DSCR................... Present
PPC_FEATURE2_EBB.................... Absent
PPC_FEATURE2_ISEL................... Absent
PPC_FEATURE2_TAR.................... Absent
PPC_FEATURE2_VEC_CRYPTO............. Absent
Ok, I think this likely is what i need to look for:
PPC_FEATURE_HAS_VSX................. Present
I can write a simple probe function in configure, that checks for that bit. If not set, then no ALTIVEC is used. I actually do not know all of the changed needed is pseudo_intrinsics.h to get this to compile/work for ALTIVEC1 Whoever ported the only did so for AltiVEC2 instruction capable machines. So for now, I will limit to that usage, and that is the best I can do.
Damn, I see your reply now, and it has VSX :( I will look for other items . I NOTE that a LOT of the HWCAP2 features are missing. Could this be VEC_CRYPTO ? Or ARCH_2_07 ?
I think we don't need nor use VSX ("vector-scalar floating-point instructions" per Wikipedia). We need and use PPC_FEATURE_HAS_ALTIVEC and with the current code also PPC_FEATURE2_ARCH_2_07 (implies AltiVec2). If you can't (conditionally) remove the dependency on AltiVec2, then perhaps just check for PPC_FEATURE2_ARCH_2_07. You can drop the -mvsx
, although it probably doesn't matter (its support is implied with 2.07 anyway).
We absolutely MUST use VSX with the current written code:
In file included from simd-intrinsics.h:18:0,
from MD5_fmt.c:17:
pseudo_intrinsics.h:109:1: error: use of ‘long’ in AltiVec types is invalid for 64-bit code without -mvsx
typedef vector unsigned long vtype64;
^
Makefile:1462: recipe for target 'MD5_fmt.o' failed
Without SVX, we can not do any 64 bit variable vectors
From https://gcc.gnu.org/onlinedocs/gcc-4.5.3/gcc/PowerPC-AltiVec_002fVSX-Built_002din-Functions.html
If -mvsx is used the following additional vector types are implemented.
vector unsigned long
vector signed long
vector double
The code as currently written in pseudo_intrins will only compile properly if -msvx and -mpower8vector are both set. I do not know how to make changes yet, for any other mode. So what i need to do, is to find what minimal CPUID type bits are required for the existing logic, and only enable it (at configure time, and we likely should put same check into john proper), and if those flags are not set, do not use altivec.
Then someone with more knowledge can later add altivec for more situations.
Ok, is the ARCH_2_07 the flag I need to look for that is lacking on the gcc build box in question? If so, then I can check for altivec bit, and the arch_207 bit. Then later if we get changed code, and can deal with v1 altivec, we can go with just altivec bit and still run if the 207 bit is lacking.
As for VSX, I mean the -mvsx If that flag is not there, then the 64 bit vectors fail. I think we need all 3 compiler flags and the code as it is today.
I have a simple c cpuid that I will compile within configure, it will only add ALTIVEC code (the code we have today), if all 3 of those bits are seen in the feature[2] flags. Then someone can figure out how to get the Altivec code running without the VSX/ALTIVEC2 extensions, and once we have that, we can fall back to using that code, IF the ALTIVEC feature flag is set. If that flag is not set, we simply fall back to using -m64/-32 only and build a non-SIMD product.
@solardiz Please try 56016ced8 (and 77fa28b26 since I forgot to autoconf prior to checking it in) on gcc-10 when you get a chance. It should build, but not include the AltiVec codes, since as written, it will not work on that machine.
It works fine on my QEMU, since it can handle the ALTIVEC2 stuff.
Once we are happy here, someone can look at what it would take to get pseudo_intrinsics.h working for ALTIVEC1. When we are there, I can make changes to configure, so that it will inform the build on which flavor of SIMD to use, so that it works faster, BUT does not cause crashes.
It may be that things like x86.S already are written to work this way (or whereever the DES crap is for SIMD). howeever, out intrisics currently do NOT work. They only handle code which can deal with -msvx and -power8vector code (i.e. ALTIVEC2 I believe).
But the patch here, reads from /proc/self/auvx and will only allow ALTIVEC builds is all required attributes are found.
@solardiz hold up a bit. i found logic problems in the script. part of the fun of working on the same section of code for 2 different tasks (ppc port and no-simd configure logic)
heres a fun way to view the HW on the PPC ;)
ghost@local:~/JtR/bleed64/src$ LD_SHOW_AUXV=1 /bin/true
AT_DCACHEBSIZE: 0x80
AT_ICACHEBSIZE: 0x80
AT_UCACHEBSIZE: 0x0
AT_SYSINFO_EHDR: 0x100000
AT_HWCAP: vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32
AT_PAGESZ: 65536
AT_CLKTCK: 100
AT_PHDR: 0x10000034
AT_PHENT: 32
AT_PHNUM: 8
AT_BASE: 0xf7900000
AT_FLAGS: 0x0
AT_ENTRY: 0x10000c5c
AT_UID: 1000
AT_EUID: 1000
AT_GID: 1000
AT_EGID: 1000
AT_SECURE: 0
AT_RANDOM: 0xfff4a6e2
AT_HWCAP2: tar isel ebb dscr htm arch_2_07
AT_EXECFN: /bin/true
AT_PLATFORM: power8
AT_BASE_PLATFORM:power8
@solardiz Please try the lastest bleeding-jumbo. This should build fine, and run fine on the gcc machine.
ALTIVEC should be disabled on that box.
Once we know that it works properly there, we can close this topic. We should re-open a topic to port to ALTIVEC1 SIMD, if that is desired. That is past my current skill set.
@solardiz
Please run this on the GCC box-10, and list results:
$ gcc -mpower8-vector -mvsx -dM -E - < /dev/null | grep -E "POWER8|VSX"
#define __VSX__ 1
#define __POWER8_VECTOR__ 1
I believe we have to code for these situations.
if ./ppc_cpuid PPC_FEATURE2_ARCH_2_07 returns 1. set gcc flags -maltivec -msvx -mpower8-vector -DJOHN_WITH_ALTIVEC2
if ./ppc_cpuid PPC_FEATURE2_ARCH_2_07 returns 0. but both PPC_FEATURE_HAS_ALTIVEC and PPC_FEATURE_HAS_VSX return 1 (i.e. altivec v1 with VSX, vsx is required for long/long long data vector type) set gcc flags -maltivec -msvx
Then we need to get the code working without the -mpower8-vector, and only enable that (current code we have today), with the JOHN_WITH_ALTIVEC2 define set. We will use that switch to chose the code.
I can easily code the plumbing for this, with the exception of getting the code working 'without' power8-vector. I will give it a shot, but I am not sure I am capable of getting that part done.
I have found this type problem on other warez also. It looks like the *add_epi64() type is NOT compatible with AltiVec, prior to POWER8 level. and if the iron does not have the 2_07 level (power 8)
, you may be able to compile the code, BUT it is going to crash :(
https://github.com/IvantheDugtrio/veclib/blob/master/vec128intlib.c
If someone can show how to write a compatible vadd_epi64() for us, I will certainly use it. Until then, john will only be ALTIVEC SIMD comapatible with Power8 or above.
I am going to see if the SIMD code (minus the simd-intrinsics logic is compatible with <POWER8 hardware. I am not sure i can easily get a build to add SIMD to core john, but have the simd-intrinsics.c code NOT be used. It may simply be better to scrap work to get SIMD for altivec on <POWER8, unless someone is able to provide a replacement for vadd_epi64()
Somehow the machine is unreachable now. As I recall, I was able to build your previous revision of bleeding-jumbo (which I commented on in here) without -mvsx
, but with -mpower8-vector
. Probably the latter implies the former? That would make sense to me as -mpower8-vector
means ISA 2.07, which includes VSX too.
As to builds for AltiVec1, when we're able to make those again, perhaps they wouldn't require VSX because we'd have to avoid 64-bit integer types within SIMD when we're on pre-2.07 (pre-POWER8) anyway.
And yes, you're right - we need to implement SIMD 64-bit integer add with plain AltiVec intrinsics when we don't have that as an intrinsic. I think it shouldn't be that hard: do an equivalent 32-bit add and then add 1 to the odd-numbered 32-bit elements if the corresponding even-numbered element is smaller than any of the two operands. There's probably a SIMD comparison intrinsic which could be used to get 32-bit all 0's or all 1's, then we shift that by 32 bits to the left and 32-bit SIMD add it again. That's probably ~4 intrinsics (and instructions) total to replace this missing SIMD 64-bit add.
The problem with the approach at hardware feature detection you've taken so far is that it's Linux-specific. Ideally, we'd also support AIX, older OS X, and maybe *BSD's.
Do those not use ELF ?
That was really the ONLY code I was able to find on ALTIVEC detection short of catching exceptions.
Now we 'could' perform the excpetion checking like i had done in the past for SSE* logic. Simply write a tiny stub program that exits with (0) return code. then build and run. If the return is 0, the code executed without exception. Even though I found some 'do not do it that way' comments about exception probing, I think that doing this with totally stand alone processes within the configure testing would be an option. It would not be ideal for a cpuid INSIDE running john. The comments made about hidden gotchas were when code would use the exception model, but not clean things up well, it could cause problems later on in the run. BUT within configure, I do not see that problem. The stub process will run or not. If not, we know that THIS machine can not handle the code. We would still need some way to do runtime cpuid checks inside john.
Do those not use ELF ?
This varies, but I was referring to the dependency on /proc/self/auxv
, which is part of Linux procfs and is unlikely to be available on other systems except possibly through Linux "emulation" (or compatibility layer) where it might be available (I'd guess on some of the *BSD's, but I don't really know).
Have you looked into my idea of checking MSR bits?
I no longer use /proc/self/auvx I walk the env pointer to the end, and then walk a dual-long array from there.
Same data as in the /proc/self/auvx, but now is ELF layout dependent.\
This is the new code. done inside main() It is the same code as before, just gets the data from the running process, not from a file in the /proc/self/* part of the OS.
unsigned long caps=0, caps2=0, *auxv;
int i;
// skip past ENV, to the auxv array of longs.
// NOTE, using 'long' data type works properly on either 32 or 64 bit builds.
while (*envp++ != NULL);
for (i = 0, auxv = (long*)envp; auxv[i] != AT_NULL ; i += 2) {
/* find data for AT_HWCAP or AT_HWCAP2 depending upon how called. */
if (auxv[i] == AT_HWCAP)
caps = auxv[i+1];
else if (auxv[i] == AT_HWCAP2)
caps2 = auxv[i+1];
}
NOTE, this code DOES require these headers, however:
/* magic constants for auxillary table id's (the AT_HWCAP type numbers) */
#include <linux/auxvec.h>
/* magic constants PowerPC specific! for CPU CAPACITY bits (the PPC_FEATURE_HAS_ALTIVEC bits) */
#include <asm/cputable.h>
So from the directory name, the ATHWCAP (and other AT* flags), 'may' be Linux special logic. The <asm/cputable.h> appears to be ppc specific. I do not see that file on other systems. NOTE, the program DOES have this to start with, so it will ONLY build on a PPC system:
#if !defined(PPC) && !defined(powerpc)
#error program specifically written to deal with CPU information on the PowerPC
#endif
Have you looked into my idea of checking MSR bits?
No I have not done so yet.
Here is what I am using to 'find' this data. now for processing, the defines for the type of data IS in a </linux/*> header, so if it is not linux, we may have to find it in other ways. BUT I think this appears to be how an ELF[64] process is laid out in memory.
This was a page I used to change the logic.
I've been using https://static2.rpteng.com/TALOS/documentation/PowerISA_V2.07_PUBLIC.pdf (very long and detailed), but I can't find the table with MSR bits in there anymore. ;-(
BTW, there's also a document on ISA 3.0 from Nov 2015, and somehow it's down to ~1200 pages from 2.07's ~1500 pages. Overall, it looks similar. And I also can't find that table in there, although I recall seeing it in the 2.07 manual previously.
From this file: #include <linux/auxvec.h>
I only use these values, and they were the same on Fedora linux (running in VM), and the Debian running PCC64 in QEUM. But again, Linux vs Linux. I think I can get to a OpenBSD machine, and test (simply dumping the data, not trying to interpret.
On all systems, the first 17 were set (actually, there were more than 17 now). then there are > 31. I saw a few of them, like 32, but i am not using them right now.
AT_NULL is 0 (duh) AT_HWCAP is 16 AT_HWCAP2 is 26
here is what was on PPC64 (only defines left). there are actually 2 files. <linux/auxvec.h> and <asm/auxvec.h> Same was seen on the Intel side NOTE, the lower number (in linux/auxvec.h) are the same without regard to the OS. it looks like each OS gets it's own unique enumerations put int <asm/auxvec.h> and should not be used, without KNOWING we are building for that OS.
$ cat /usr/include/linux/auxvec.h
#include <asm/auxvec.h>
#define AT_NULL 0 /* end of vector */
#define AT_IGNORE 1 /* entry should be ignored */
#define AT_EXECFD 2 /* file descriptor of program */
#define AT_PHDR 3 /* program headers for program */
#define AT_PHENT 4 /* size of program header entry */
#define AT_PHNUM 5 /* number of program headers */
#define AT_PAGESZ 6 /* system page size */
#define AT_BASE 7 /* base address of interpreter */
#define AT_FLAGS 8 /* flags */
#define AT_ENTRY 9 /* entry point of program */
#define AT_NOTELF 10 /* program is not ELF */
#define AT_UID 11 /* real uid */
#define AT_EUID 12 /* effective uid */
#define AT_GID 13 /* real gid */
#define AT_EGID 14 /* effective gid */
#define AT_PLATFORM 15 /* string identifying CPU for optimizations */
#define AT_HWCAP 16 /* arch dependent hints at CPU capabilities */
#define AT_CLKTCK 17 /* frequency at which times() increments */
/* AT_* values 18 through 22 are reserved */
#define AT_SECURE 23 /* secure mode boolean */
#define AT_BASE_PLATFORM 24 /* string identifying real platform, may
* differ from AT_PLATFORM. */
#define AT_RANDOM 25 /* address of 16 random bytes */
#define AT_HWCAP2 26 /* extension of AT_HWCAP */
#define AT_EXECFN 31 /* filename of program */
$ cat /usr/include/asm/auxvec.h
/*
* We need to put in some extra aux table entries to tell glibc what
* the cache block size is, so it can use the dcbz instruction safely.
*/
#define AT_DCACHEBSIZE 19
#define AT_ICACHEBSIZE 20
#define AT_UCACHEBSIZE 21
/* A special ignored type value for PPC, for glibc compatibility. */
#define AT_IGNOREPPC 22
/* The vDSO location. We have to use the same value as x86 for glibc's
* sake :-)
*/
#define AT_SYSINFO_EHDR 33
I will dig into the link you list. NOTE, this is all still someone POC, until we find a better way to do it ;)
ISA 3.0 is not in HW yet, correct? Isn't that for POWER9
Yes, ISA 3.0 is POWER9, but I think POWER9 is already starting to appear in hardware.
Ok as for POWER9, all I had found was 'future hardware', but that does not mean it is not already out.
Btw, isnt reading from MSR a privileged operation?
ISA 2.07 page 857-858 "3.2.1 Machine State Register" bits 38 (VEC), 40 (VSX). I don't know if reading from MSR is privileged or not.
There is a PVR register. It contains version of the processor (2 16 bit values). It can be read, but requires privilege.
www.csit-sun.pub.ro/~cpop/Documentatie_SMP/...PowerPC/PowerPc/.../pemch2.pdf
Just taking notes. After the recent changes
From:
checking special compiler flags... PPC64le
checking special compiler flags... PowerPC64
checking if gcc supports -finline-functions... yes
checking if gcc supports -finline-limit=4000... yes
checking if gcc supports -fno-strict-aliasing... yes
checking if gcc supports -maltivec... yes
checking if gcc supports -mvsx... yes
checking if gcc supports -mpower8-vector... yes
checking for arch.h alternative... ppc64.h
checking for extra ASFLAGS... None needed
To:
checking special compiler flags... PPC64le
checking special compiler flags... PowerPC64
checking if gcc supports -finline-functions... yes
checking if gcc supports -finline-limit=4000... yes
checking if gcc supports -fno-strict-aliasing... yes
ppc_cpuid.c:31:2: error: #error program specifically written to deal with CPU information on the PowerPC
#error program specifically written to deal with CPU information on the PowerPC
^
ppc_cpuid.c: In function 'main':
ppc_cpuid.c:107:19: warning: pointer targets in assignment differ in signedness [-Wpointer-sign]
for (i = 0, auxv = (long*)envp; auxv[i] != AT_NULL ; i += 2) {
^
ppc_cpuid.c:115:28: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
if (argc < 2 || argc == 2 && !strcmp(argv[1], "?"))
^
./configure: line 8526: ./test_cpuid: No such file or directory
./configure: line 8526: test: =: unary operator expected
rm: cannot remove 'test_cpuid': No such file or directory
checking for extra ASFLAGS... None needed
Jim, have you found out if the 64-bit MSR can be read by user code or not?
While your parsing of the auxv vector might work on some non-Linux, this won't even build on non-Linux:
/* magic constants for auxillary table id's (the AT_HWCAP type numbers) */
#include <linux/auxvec.h>
/* magic constants PowerPC specific! for CPU CAPACITY bits (the PPC_FEATURE_HAS_ALTIVEC bits) */
#include <asm/cputable.h>
yes @solardiz I do understand the implications, and it is very likely that reading the auxvec data is not going to be compatible outside of Linux.
@claudioandre please try again, after 2de66d38d In that change to the ppc_cpuid.c, I use a define, looking for gnu_linux. If it finds gnu_linux, then the code should compile and be able to walk the auxv array. Otherwise, it simply returns '1' for each of the flags.
So on non linux systems it will use AlitVEC if the compiler supports it. If that build fails, the user will have to do a configure forcing SIMD usage off, using the --simd-disable I do not see any other way, until we find a CPUID that works fully and properly and is portable to all PPC chips and OS's.
@claudioandre I see there was a bug in original submission. But with this change, how is this working on your PPC OpenBSD box, which had problems after this ppc_cpuid.c file was added to the configure script?
Not yet fully tested (latest patch SIMD problems). Anyway:
Target CPU ................................. powerpc64le ALTIVEC, 64-bit LE
AES-NI support ............................. no
Target OS .................................. linux-gnu
Cross compiling ............................ no
Legacy arch header ......................... ppc64.h
Optional libraries/features found:
OpenMPI support (default disabled) ......... no
Fork support ............................... yes
OpenMP support ............................. yes (not for fast formats)
OpenCL support ............................. no
Generic crypt(3) format .................... yes
librexgen (regex cracking mode) ............ yes
libgmp (PRINCE mode and faster SRP formats) yes
libpcap (vncpcap2john and SIPdump) ......... yes
libz (pkzip format, gpg2john) .............. yes
libbz2 (gpg2john extra decompression logic) yes
128-bit integer (faster PRINCE mode) ....... yes
Memory map (share/page large files) ........ yes
ZTEX USB-FPGA module 1.15y support ......... no
Install missing libraries to get any needed features that were omitted.
BTW: are you guys sure NEON configure detection is working?
BTW: are you guys sure NEON configure detection is working?
I am pretty sure the NEON code worked on my RPI3 last winter
Testing current bleeding-jumbo on gcc110:
ppc_cpuid.c: In function 'main':
ppc_cpuid.c:126:19: warning: pointer targets in assignment differ in signedness [-Wpointer-sign]
for (i = 0, auxv = (long*)envp; auxv[i] != AT_NULL ; i += 2) {
^
ppc_cpuid.c:134:28: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
if (argc < 2 || argc == 2 && !strcmp(argv[1], "?"))
^
./configure: line 8835: test: too many arguments
but the configure nevertheless succeeds:
Target CPU ................................. powerpc64, 64-bit BE
AES-NI support ............................. no
Target OS .................................. linux-gnu
Cross compiling ............................ no
Legacy arch header ......................... ppc64.h
Build is OK, too:
[solar@gcc1-power7 src]$ time make -sj60
ar: creating aes.a
rar_fmt_plug.c:443:2: warning: #warning ": target system requires aligned memory access, rar format disabled:" [-Wcpp]
#warning ": target system requires aligned memory access, rar format disabled:"
^
stribog_fmt_plug.c:495:2: warning: #warning Stribog-256 and Stribog-512 formats require SSE 4.1, formats disabled [-Wcpp]
#warning Stribog-256 and Stribog-512 formats require SSE 4.1, formats disabled
^
ar: creating secp256k1.a
Make process completed.
real 0m19.058s
user 4m41.144s
sys 0m11.354s
Now running tests.
Testing: BKS [PKCS12 PBE 32/64]... (60xOMP) FAILED (cmp_all(1))
Testing: HMAC-MD5 [password is key, MD5 32/64]... (60xOMP) FAILED (cmp_all(1))
Testing: MSCHAPv2, C/R [MD4 DES (ESS MD5) 32/64]... (60xOMP) FAILED (cmp_all(1))
Testing: mssql12, MS SQL 2012/2014 [SHA512 64/64 OpenSSL]... (60xOMP) FAILED (get_hash[0](0) 5c!=76)
Testing: netntlm, NTLMv1 C/R [MD4 DES (ESS MD5) 32/64]... (60xOMP) FAILED (cmp_all(1))
Testing: SSHA512, LDAP [SHA512 64/64 OpenSSL]... (60xOMP) FAILED (get_hash[0](0) 2e!=8c)
Testing: xsha512, Mac OS X 10.7 [SHA512 64/64 OpenSSL]... (60xOMP) FAILED (get_hash[0](0) 71!=3f)
Testing: ZipMonster, MD5(ZipMonster) [MD5-32/64 x 50000]... (60xOMP) FAILED (cmp_all(1))
8 out of 384 tests have FAILED
The same 8 fail with OMP_NUM_THREADS=1
./configure: line 8835: test: too many arguments
That is a bug in the shell if logic, and likely the version or which actual shell is being used. I may simply have to split up the if block (testing 4 items), into 4 ifs.
But the failures concern me. I was starting on a path down porting wpapsk to BE, but I will put that asside for a bit, and check this out. NOTE, @claudioandre fixed the 2 warnings already in the ppc_cpuid.c file, so those are handled.
OMP_NUM_THREADS=1 ./john -test-full=0
doesn't give more failures, but it gives different info:
Testing: BKS [PKCS12 PBE 32/64]... FAILED (cmp_all(1))
Testing: HMAC-MD5 [password is key, MD5 32/64]... FAILED (cmp_all(2048))
Testing: MSCHAPv2, C/R [MD4 DES (ESS MD5) 32/64]... FAILED (cmp_all(8192))
Testing: mssql12, MS SQL 2012/2014 [SHA512 64/64 OpenSSL]... FAILED (get_hash[0](1023) 5c!=76)
Testing: netntlm, NTLMv1 C/R [MD4 DES (ESS MD5) 32/64]... FAILED (cmp_all(8192))
Testing: SSHA512, LDAP [SHA512 64/64 OpenSSL]... FAILED (get_hash[0](2047) 2e!=8c)
Testing: xsha512, Mac OS X 10.7 [SHA512 64/64 OpenSSL]... FAILED (get_hash[0](8191) 71!=3f)
Testing: ZipMonster, MD5(ZipMonster) [MD5-32/64 x 50000]... FAILED (cmp_all(1))
8 out of 384 tests have FAILED
The line is:
if test `./test_cpuid PPC_FEATURE_HAS_ALTIVEC` = "1" && test `./test_cpuid PPC_FEATURE_HAS_VSX` = "1" && test `./test_cpuid PPC_FEATURE2_ARCH_2_07` = "1" ; then
I don't see it as necessarily problematic - it's just weird. But it should probably be:
if test `./test_cpuid PPC_FEATURE_HAS_ALTIVEC` = "1" -a `./test_cpuid PPC_FEATURE_HAS_VSX` = "1" -a `./test_cpuid PPC_FEATURE2_ARCH_2_07` = "1"; then
So we invoke test
just once, not 3 times.
The test-full results for some things, gives results which do not translate back to the exact failure string like the -test=x does. This probably should be an issue on its own.
It may be that within -test-full mode, we first perform a -test=0 call (emulate) prior to doing the extra work of test-full. That would flush out the errors more properly (easier to associate back to the offending hash), and it would do the -test-full So then there would be no reason to do -test= and -test-full= The -test-full should simply cover all bases.
OMP_NUM_THREADS=60 ./john -test-full=0
:
Testing: BKS [PKCS12 PBE 32/64]... (60xOMP) FAILED (cmp_all(60))
Testing: HMAC-MD5 [password is key, MD5 32/64]... (60xOMP) FAILED (cmp_all(122880))
Testing: MSCHAPv2, C/R [MD4 DES (ESS MD5) 32/64]... (60xOMP) FAILED (cmp_all(491520))
Testing: mssql12, MS SQL 2012/2014 [SHA512 64/64 OpenSSL]... (60xOMP) FAILED (get_hash[0](61439) 5c!=76)
Testing: netntlm, NTLMv1 C/R [MD4 DES (ESS MD5) 32/64]... (60xOMP) FAILED (cmp_all(491520))
Testing: SSHA512, LDAP [SHA512 64/64 OpenSSL]... (60xOMP) FAILED (get_hash[0](122879) 2e!=8c)
Testing: xsha512, Mac OS X 10.7 [SHA512 64/64 OpenSSL]... (60xOMP) FAILED (get_hash[0](491519) 71!=3f)
Testing: ZipMonster, MD5(ZipMonster) [MD5-32/64 x 50000]... (60xOMP) FAILED (cmp_all(60))
8 out of 384 tests have FAILED
So we invoke test just once, not 3 times.
Understood, but when I did that, I was getting the error you are listing (test: too many arguments). That is why I reverted back to 4 independent test statements and shell booleans
In the non-Linux portion of ppc_cpuid.c
, you assume that argv[1]
is non-NULL, and will segfault if it is. Intentional?
No, not intentional. The program is never really written to be used, on non-linux, outside of the controlled usage in configure. It gives ZERO information. But yes, that crash is not what was expected.
All that I wanted to do, is to NOT return a failure, or have a compiler failure on a non-linux system. It simply says that everything is there, thus canceling out the cpuid logic, and building SIMD if the compiler supports doing so. Then on those non-linux systems if the build fails (i.e. SIMD support is not new enough) the user will have to run configure adding the --disable-simd flag. That flag totally avoids all the SIMD checking within the m4/jtr_ppc.m4 macros.
Trying to build today's bleeding-jumbo on GCC Compile Farm's "gcc110" configures as:
That was simple
./configure
with no options. Apparently, it didn't detect AltiVec? (The hardware supports AltiVec.) Anyway, the build fails with many errors like:and many more like these, in other source files too, indicating that our source files try to use SIMD anyway.
I've tried these, to no avail:
although the ways the build failed changed. In none of these cases would
./configure
explicitly say it'd use AltiVec. I think we should introduce end-user friendly configure options to enable/disable SIMD (and have those options substitute the needed platform-specific compiler flags automatically).BTW, there are also these warnings: