Closed clinew closed 8 years ago
You could try forcing lower GWS and see if it helps. Run with --verb=5
and see what GWS it ended up with. Then re-run prefixing the whole command line with GWS=<half of that>
eg:
GWS=512 ./john --wordlist=mywordlist --rules=mycustomrules mygpgkey.asc --format=gpg-opencl
If it still crashes, halve it again (and so on).
On a side note: Is inline
not supported by MESA at all, or did removing it just happen to make the build error go away?
seneca run # lspci -k | grep -A3 VGA
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon R9 290X]
Subsystem: PC Partner Limited / Sapphire Technology R9 290X Tri-X OC
Kernel driver in use: radeon
Kernel modules: radeon, amdgpu
Interesting. I was expecting to see amdgpu (xf86-video-amdgpu) loaded. Have you tried to blacklist radeon or uninstall xf86-video-ati?
@magnumripper:
Setting GWS
to 131072
seems to have fixed it! (It was coming up with 225280
by default).
Regarding inline
, I don't know the root cause, only that removing it seems to have fixed the problem. I'd guess that the symbols are being stripped when the functions are inlined, causing linkage problems down the road.
@claudioandre:
As I recall from when I was desperately trying to get the amdgpu
driver working after giving up on fglrx
so that I could play video games again, I got mixed answers to whether amdgpu
or radeon
would work with my card. According to the Arch Wiki, it looks like amdgpu
only works "experimentally" on this card, so it kind-of-is-kind-of-isn't supported I suppose (https://wiki.archlinux.org/index.php/AMDGPU#Enable_amdgpu_for_Sea_Islands_Cards).
Anyways, I'm glad to have gotten it working. Thank you both for the help!
gpg-opencl is also failing in our TravisCI builds.
Testing: gpg-opencl, OpenPGP / GnuPG Secret Key [SHA1 OpenCL]... FAILED (cmp_all(49))
So we should probaby commit this change
diff --git a/src/opencl_gpg_fmt_plug.c b/src/opencl_gpg_fmt_plug.c
index 6db9c4b..bbffee6 100644
--- a/src/opencl_gpg_fmt_plug.c
+++ b/src/opencl_gpg_fmt_plug.c
@@ -174,7 +174,7 @@ static void reset(struct db_main *db)
sizeof(gpg_password), 0, db);
// Auto tune execution from shared/included code.
- autotune_run(self, 1, 0, 1000);
+ autotune_run(self, 1, 0, 300);
}
}
A full second is a very extreme kernel duration.
Fixed in #2211
The (benign) warning about cl_amd_media_ops should be gone after 6a63375
Not sure what do do with the inline
issue. I think you should report it to MESA as a bug. Some OpenCL drivers require either static
or inline
for non-kernel OpenCL functions while others only support inline
or nothing. So to be compatible with all of them we always use inline
.
OTOH we could get away with something like
#ifndef MESA
inline
#endif
void S2KItSaltedSHA1Generator(...)
I can't remember if we have something like that available. I'll check it out.
Found it, __MESA__
can be used. I committed a workaround in 6c60b8f. @clinew can you build and run without any manual edits after that?
@magnumripper Nope, it still fails to build:
frostsnow@seneca ~/software/Jill/run $ ./john --verb=5 --wordlist=mywordlist --rules=myrules mykey.asc --format=gpg-opencl
initUnicode(UNICODE, UTF-8/ISO-8859-1)
UTF-8 -> UTF-8 -> UTF-8
Device 0: AMD HAWAII (DRM 2.45.0 / 4.7.0, LLVM 3.8.1)
Using default input encoding: UTF-8
Loaded 1 password hash (gpg-opencl, OpenPGP / GnuPG Secret Key [SHA1 OpenCL])
Cost 1 (s2k-count) is 3538944 for all loaded hashes
Cost 2 (hash algorithm [1:MD5 2:SHA1 3:RIPEMD160 8:SHA256 9:SHA384 10:SHA512 11:SHA224]) is 2 for all loaded hashes
Cost 3 (cipher algorithm [1:IDEA 2:3DES 3:CAST5 4:Blowfish 7:AES128 8:AES192 9:AES256]) is 3 for all loaded hashes
Will run 8 OpenMP threads
Loaded 16 hashes with 16 different salts to test db from test vectors
Options used: -I /home/frostsnow/software/Jill/run/kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=10 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=12 -DDEV_VER_MINOR=0 -D_OPENCL_COMPILER -DPLAINTEXT_LENGTH=125 -DSALT_LENGTH=8 $JOHN/kernels/gpg_kernel.cl
Build log: unsupported call to function S2KItSaltedSHA1Generator in gpg
Error -11 building kernel $JOHN/kernels/gpg_kernel.cl. DEVICE_INFO=10
OpenCL CL_BUILD_PROGRAM_FAILURE error in common-opencl.c:1053 - clBuildProgram failed.
frostsnow@seneca ~/software/Jill/run $ git log --oneline | head -n 1
6c60b8f OpenCL: Avoid using 'inline' with MESA in some cases. See #2204
I'll see about filing a bug with MESA sometime later this week, next Wednesday at the latest.
Options used: -I /home/frostsnow/software/Jill/run/kernels -cl-mad-enable -DGPU -DDEVICE_INFO=10 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=12 -DDEV_VER_MINOR=0 -D_OPENCL_COMPILER -DPLAINTEXT_LENGTH=125 -DSALT_LENGTH=8 $JOHN/kernels/gpg_kernel.cl
Oh, we didn't get the __MESA__
macro because they now write it as Mesa while we were checking only for MESA... Please try 9f8e940.
I ran 9f8e940 but it didn't build. Using printf
showed me that the string being used was in fact Clover
, not Mesa
, so I made the following change:
diff --git a/src/common-opencl.c b/src/common-opencl.c
index f62cb66..88f7163 100644
--- a/src/common-opencl.c
+++ b/src/common-opencl.c
@@ -2312,7 +2312,8 @@ int get_platform_vendor_id(int platform_id)
strstr(dname, "AMD") != NULL || strstr(dname, "ATI") != NULL)
return DEV_AMD;
- if ((strstr(dname, "MESA") != NULL) || (strstr(dname, "Mesa") != NULL))
+ if ((strstr(dname, "MESA") != NULL) || (strstr(dname, "Mesa") != NULL) || (strstr(dname, "Clover") != NULL))
return DEV_MESA;
return DEV_UNKNOWN;
at which point things built, but didn't run because the auto-tuning seems to be off still. Below is a run without manual tuning:
frostsnow@seneca ~/software/Jill/run $ ./john --verb=5 --wordlist=mylist --rules=myrules --format=gpg-opencl mykey.asc
initUnicode(UNICODE, UTF-8/ISO-8859-1)
UTF-8 -> UTF-8 -> UTF-8
Device 0: AMD HAWAII (DRM 2.45.0 / 4.7.0, LLVM 3.8.1)
Using default input encoding: UTF-8
Loaded 1 password hash (gpg-opencl, OpenPGP / GnuPG Secret Key [SHA1 OpenCL])
Cost 1 (s2k-count) is 3538944 for all loaded hashes
Cost 2 (hash algorithm [1:MD5 2:SHA1 3:RIPEMD160 8:SHA256 9:SHA384 10:SHA512 11:SHA224]) is 2 for all loaded hashes
Cost 3 (cipher algorithm [1:IDEA 2:3DES 3:CAST5 4:Blowfish 7:AES128 8:AES192 9:AES256]) is 3 for all loaded hashes
Will run 8 OpenMP threads
Loaded 16 hashes with 16 different salts to test db from test vectors
Calculating best GWS for LWS=64; max. 150ms single kernel invocation.
Raw speed figures including buffer transfers:
xfer: 481.630us*, crypt: 10ms, xfer: 132.593us
gws: 256 24118c/s 24118 rounds/s 10.614ms per crypt_all()!
xfer: 181.037us*, crypt: 8.147ms, xfer: 129.333us
gws: 512 60532c/s 60532 rounds/s 8.458ms per crypt_all()!
xfer: 210.667us*, crypt: 8.146ms, xfer: 197.185us
gws: 1024 119702c/s 119702 rounds/s 8.554ms per crypt_all()+
xfer: 413.333us*, crypt: 8.149ms, xfer: 207.111us
gws: 2048 233529c/s 233529 rounds/s 8.769ms per crypt_all()+
xfer: 700us*, crypt: 16.122ms, xfer: 233.777us
gws: 4096 240145c/s 240145 rounds/s 17.056ms per crypt_all()+
xfer: 1.061ms*, crypt: 24.096ms, xfer: 305.630us
gws: 8192 321722c/s 321722 rounds/s 25.462ms per crypt_all()+
xfer: 1.751ms*, crypt: 48.018ms, xfer: 294.667us
gws: 16384 327258c/s 327258 rounds/s 50.064ms per crypt_all()+
xfer: 2.241ms*, crypt: 95.747ms, xfer: 449.037us
gws: 32768 332879c/s 332879 rounds/s 98.438ms per crypt_all()+
xfer: 4.391ms*, crypt: 192.420ms (exceeds 150ms)
Calculating best LWS for GWS=32768
Testing LWS=64 GWS=32768 .../^H-^H 191.186ms+
Testing LWS=128 GWS=32768 ...\^H|^H 108.161ms+
Testing LWS=256 GWS=32768 .../^H-^H 75.855ms+
Calculating best GWS for LWS=256; max. 300ms single kernel invocation.
Raw speed figures including buffer transfers:
xfer: 1.433ms*, crypt: 20.622ms, xfer: 344us
gws: 14080 628558c/s 628558 rounds/s 22.400ms per crypt_all()!
xfer: 897.186us*, crypt: 31.106ms, xfer: 454.519us
gws: 28160 867576c/s 867576 rounds/s 32.458ms per crypt_all()+
xfer: 3.896ms*, crypt: 53.957ms, xfer: 485.778us
gws: 56320 965370c/s 965370 rounds/s 58.340ms per crypt_all()+
xfer: 4.579ms*, crypt: 105.173ms, xfer: 1.565ms
gws: 112640 1011Kc/s 1011869 rounds/s 111.318ms per crypt_all()+
xfer: 7.741ms*, crypt: 203.369ms, xfer: 3.117ms
gws: 225280 1051Kc/s 1051586 rounds/s 214.228ms per crypt_all()+
xfer: 15.230ms*, crypt: 400.027ms (exceeds 300ms)
Local worksize (LWS) 256, global worksize (GWS) 225280
\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H ^HPress 'q' or Ctrl-C to abort, almost any other key for status
radeon: The kernel rejected CS, see dmesg for more information.
radeon: The kernel rejected CS, see dmesg for more information.
[repeats indefinitely]
and one with manual tuning:
frostsnow@seneca ~/software/Jill/run $ GWS=131072 ./john --verb=5 --wordlist=mylist --rules=myrules --format=gpg-opencl mykey.asc
initUnicode(UNICODE, UTF-8/ISO-8859-1)
UTF-8 -> UTF-8 -> UTF-8
Device 0: AMD HAWAII (DRM 2.45.0 / 4.7.0, LLVM 3.8.1)
Using default input encoding: UTF-8
Loaded 1 password hash (gpg-opencl, OpenPGP / GnuPG Secret Key [SHA1 OpenCL])
Cost 1 (s2k-count) is 3538944 for all loaded hashes
Cost 2 (hash algorithm [1:MD5 2:SHA1 3:RIPEMD160 8:SHA256 9:SHA384 10:SHA512 11:SHA224]) is 2 for all loaded hashes
Cost 3 (cipher algorithm [1:IDEA 2:3DES 3:CAST5 4:Blowfish 7:AES128 8:AES192 9:AES256]) is 3 for all loaded hashes
Will run 8 OpenMP threads
Loaded 16 hashes with 16 different salts to test db from test vectors
Calculating best LWS for GWS=131072
Testing LWS=64 GWS=131072 ... 373.741ms+
Testing LWS=128 GWS=131072 ... 208.193ms+
Testing LWS=256 GWS=131072 ... 134.698ms+
Local worksize (LWS) 256, global worksize (GWS) 131072
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:25 0.26% (ETA: 20:45:17) 0g/s 20172p/s 20172c/s 20172C/s [REDACTED]
Session aborted
(Some information redacted from logs).
Regarding MESA, it looks like someone else has run the "unsupported call" issue before but didn't narrow it down to the inline
declaration (see: https://bugs.freedesktop.org/show_bug.cgi?id=87071); I've updated the issue with what's been found and mentioned that a workaround exists.
radeon: The kernel rejected CS, see dmesg for more information.
Can you post the corresponding info from dmesg, please?
The dmesg
output is the same as what I posed earlier. I could get another copy if you really want.
Also, I was able build 22ad196 without errors.
You mean the crash backtraces? The "kernel rejected CS, see dmesg" message had me thinking there was some other info now.
Oh, no there isn't any new info, sorry. The program originally errored with many lines of kernel rejected CS, see dmesg
and then the long dmesg
backtrace that I originally posted. I didn't post the program outpost at first because the crash brought down the entire X window system, taking the ptty with it.
So if you peg GWS low enough it works fine, but not if you let it autotune? What if you manually make this change, does it work better?
diff --git a/src/opencl_gpg_fmt_plug.c b/src/opencl_gpg_fmt_plug.c
--- a/src/opencl_gpg_fmt_plug.c
+++ b/src/opencl_gpg_fmt_plug.c
@@ -174,7 +174,7 @@ static void reset(struct db_main *db)
sizeof(gpg_password), 0, db);
// Auto tune execution from shared/included code.
- autotune_run(self, 1, 0, 300);
+ autotune_run(self, 1, 0, 200);
}
}
That manual change seems to have worked. Presumably this is related to the waning in README-OPENCL
about AMD drivers crashing after repeated runs from durations as short as 200ms. The log is as follows:
initUnicode(UNICODE, UTF-8/ISO-8859-1)
UTF-8 -> UTF-8 -> UTF-8
Device 0: AMD HAWAII (DRM 2.45.0 / 4.7.0, LLVM 3.8.1)
Using default input encoding: UTF-8
Loaded 1 password hash (gpg-opencl, OpenPGP / GnuPG Secret Key [SHA1 OpenCL])
Cost 1 (s2k-count) is 3538944 for all loaded hashes
Cost 2 (hash algorithm [1:MD5 2:SHA1 3:RIPEMD160 8:SHA256 9:SHA384 10:SHA512 11:SHA224]) is 2 for all loaded hashes
Cost 3 (cipher algorithm [1:IDEA 2:3DES 3:CAST5 4:Blowfish 7:AES128 8:AES192 9:AES256]) is 3 for all loaded hashes
Will run 8 OpenMP threads
Loaded 16 hashes with 16 different salts to test db from test vectors
Options used: -I /home/frostsnow/software/Jill/run/kernels -cl-mad-enable -D__MESA__-D__GPU__ -DDEVICE_INFO=10 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=12 -DDEV_VER_MINOR=0 -D_OPENCL_COMPILER -DPLAINTEXT_LENGTH=125 -DSALT_LENGTH=8 $JOHN/kernels/gpg_kernel.cl
Build log: In file included from <built-in>:308:
<command line>:1:17: warning: ISO C99 requires whitespace after the macro name
binary size 111367
Calculating best GWS for LWS=64; max. 100ms single kernel invocation.
Raw speed figures including buffer transfers:
xfer: 442.666us*, crypt: 9.674ms, xfer: 56.148us
gws: 256 25164c/s 25164 rounds/s 10.172ms per crypt_all()!
xfer: 89.333us*, crypt: 8.161ms, xfer: 51.852us
gws: 512 61665c/s 61665 rounds/s 8.302ms per crypt_all()!
xfer: 93.926us*, crypt: 8.149ms, xfer: 59.407us
gws: 1024 123338c/s 123338 rounds/s 8.302ms per crypt_all()!
xfer: 127.852us*, crypt: 8.149ms, xfer: 70.371us
gws: 2048 245328c/s 245328 rounds/s 8.348ms per crypt_all()+
xfer: 192.297us*, crypt: 16.131ms, xfer: 83.555us
gws: 4096 249650c/s 249650 rounds/s 16.406ms per crypt_all()+
xfer: 330.666us*, crypt: 24.085ms, xfer: 108.445us
gws: 8192 334036c/s 334036 rounds/s 24.524ms per crypt_all()+
xfer: 596.148us*, crypt: 47.934ms, xfer: 155.111us
gws: 16384 336528c/s 336528 rounds/s 48.685ms per crypt_all()
xfer: 1.147ms*, crypt: 95.595ms, xfer: 265.333us
gws: 32768 337785c/s 337785 rounds/s 97.008ms per crypt_all()+
xfer: 2.286ms*, crypt: 191.018ms (exceeds 100ms)
Calculating best LWS for GWS=32768
Testing LWS=64 GWS=32768 .../^H-^H 191.172ms+
Testing LWS=128 GWS=32768 ...\^H|^H 108.497ms+
Testing LWS=256 GWS=32768 .../^H-^H 69.434ms+
Calculating best GWS for LWS=256; max. 200ms single kernel invocation.
Raw speed figures including buffer transfers:
xfer: 1.467ms*, crypt: 19.149ms, xfer: 239.704us
gws: 14080 675086c/s 675086 rounds/s 20.856ms per crypt_all()!
xfer: 746.666us*, crypt: 31.203ms, xfer: 340us
gws: 28160 872102c/s 872102 rounds/s 32.289ms per crypt_all()+
xfer: 2.636ms*, crypt: 53.199ms, xfer: 734.518us
gws: 56320 995571c/s 995571 rounds/s 56.570ms per crypt_all()+
xfer: 3.961ms*, crypt: 104.060ms, xfer: 1.813ms
gws: 112640 1025Kc/s 1025541 rounds/s 109.834ms per crypt_all()+
xfer: 7.735ms*, crypt: 205.146ms (exceeds 200ms)
Local worksize (LWS) 256, global worksize (GWS) 112640
\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H/^H-^H\^H|^H ^HPress 'q' or Ctrl-C to abort, almost any other key for status
Wait...^M0g 0:00:01:31 0.82% (ETA: 21:29:57) 0g/s 20856p/s 20856c/s 20856C/s [REDACTED]
Session aborted
I'm just not sure we want to lower our default that much though. Ideally the kernel should be re-written and this problem will go away.
Come to think of it, there's an option Global_MaxDuration = 200
in john.conf, commented out by default. I think that will have the same effect without modifying source and recompiling. And even better, it will affect all formats. So you might want to uncomment that one.
NOTE though that I just verified that this functionality still worked, and it had bugged out. Fixed now, (2a71bfb) so you need latest code.
That also worked for me (on 2a71bfb).
First issue that I ran into was the following error:
...which was fixed by going into
run/kernels/gpg_kernel.cl
and removing theinline
attribute. The same for functions inrun/kernels/opencl_sha1_ctx.h
.After that, running
./john -test --format=gpg-opencl
passes with:...but then doing an actual run causes the GPU to lockup and crash the driver. I'm not sure where the source of the of the problem is. Any idea where I can start looking to debug?
The command I'm using is:
./john --wordlist=mywordlist --rules=mycustomrules mygpgkey.asc --format=gpg-opencl
Various system information is as follows:
Though the
README-OPENCL
warns against using the "X11 opensource" drivers, I'm not sure how that warning fits into the newamdgpu
integration. I also ranpiglit
to sanity-check OpenCL and it went pretty well with 927 passes, 14 fails, and 12 skipped tests.Finally, the monstrous kernel crash:
At this point I have no idea if the issue is in the OpenCL code itself, the kernel driver, or LLVM/clang.