magnumripper commented 5 years ago

Lets's list / discuss here what we need to do before a release. Or rather, a list of NEED, one of NICE and one of DON'Ts.

See also #1879

solardiz commented 5 years ago

@magnumripper I've just committed the LUT3 for AVX-512 stuff into core, please merge this into jumbo. As to speeds, it turned out that on the Xeon Phi 7210 the best descrypt speed was at 128 threads, not 256. For the previous revision, it was ~209M. For this new revision, it's ~224M. Increasing DES_bs_cpt from 32 to 64 or 128 provides some speedup to ~230M, but has drawbacks. Frankly, I expected greater speedup due to LUT3, especially given that single-thread speed improves with the move to LUT3 by almost 30% (relative to only using LUT3 to implement vsel; the speedup relative to non-vsel was even greater). I guess this means we're somehow memory-bound in the multi-threaded case.

magnumripper commented 5 years ago

This appears to have improved the 256x OMP speed on a KNL from ~156M in jumbo to ~189M in core.

Wow, how did I miss that? I recall I looked for opportunities in DES too.

After you cvsimport this, I might (...)

You obviously don't need to wait for merges between your commits - I can opt to merge one commit (or less) at a time anyway if/when needed. I'll probably do scattered bursts of work at cryptographically random intervals over the weekend.

BTW what s-boxes are you using for DES with LUT3? Is the nvidia LUT3 work by that JohnDoe character of any use for the CPU case?

solardiz commented 5 years ago

You obviously don't need to wait for merges between your commits

As we've just found out, when I make multiple commits changing one source file and you cvsimport them at once, then they get squashed and only the latest commit message is seen. So I thought as a workaround I'd wait for you to merge my previous set of changes before making more changes. But I already didn't wait long enough today. ;-(

BTW what s-boxes are you using for DES with LUT3? Is the nvidia LUT3 work by that JohnDoe character of any use for the CPU case?

Exactly that. At first, I only enabled use of LUT3 to implement vsel, using Roman's sboxes-s.c. This provided the speedup from ~156M to ~189M that I mentioned. Then I used JohnDoe's S1-S3,S5-S8 along with Roman's S4 that use LUT3 directly. This provided the mentioned ~209M to ~224M.

solardiz commented 5 years ago

Actual cracking with 128 OpenMP threads gives only ~167M max (tuned with OMP_NUM_THREADS=128 GOMP_CPU_AFFINITY=0-255 GOMP_SPINCOUNT=10000), whereas non-OpenMP build with 128 forks gives ~2800K*128 = ~358M. So our OpenMP code just doesn't scale that well. 16 processes with 16 threads each gives a total of ~270M.

solardiz commented 5 years ago

I've just added AVX2 and AVX-512 runtime CPU detection in core, loosely based on @magnumripper's work in jumbo (thanks!)

magnumripper commented 5 years ago

You obviously don't need to wait for merges between your commits

As we've just found out, when I make multiple commits changing one source file and you cvsimport them at once, then they get squashed and only the latest commit message is seen. So I thought as a workaround I'd wait for you to merge my previous set of changes before making more changes. But I already didn't wait long enough today. ;-(

I got less time than I expected during this weekend but I'll catch up. Next time I import I'll see if I can simply get a little at a time - hopefully that's possible. Or even better: As I'm pretty sure this squashing didn't happen in the past, I might find a reason and a fix or at least a better workaround for it.

magnumripper commented 5 years ago

I now imported using the -L 1 option to git-cvsimport and repeated until nothing new, it looks like it ended up correctly.

https://github.com/magnumripper/JohnTheRipper/commits/master

magnumripper commented 5 years ago

I decided to just battle the const race until I get a build. I've been on it all day but will take a break now. 😵

claudioandre-br commented 5 years ago

I've just added AVX2 and AVX-512 runtime CPU detection in core, loosely based on @magnumripper's work in jumbo (thanks!)

I've just added AVX-512 CI tests. Everything seems ok.

checking for AVX2... yes
checking for AVX512F... yes
checking for AVX512BW... yes
[...]
Target CPU ................................. x86_64 AVX512BW, 64-bit LE
AES-NI support ............................. depends on OpenSSL
Target OS .................................. linux-gnu
Cross compiling ............................ no
Legacy arch header ......................... x86-64.h
[...]
-- Build Info --
Version: 1.8.0.14-jumbo-1-bleeding-d06b472a8 2019-03-25 13:32:52 -0300
Build: linux-gnu 64-bit x86_64 AVX512BW AC OMP
SIMD: AVX512BW, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
CPU tests: AVX512BW
$JOHN is ../run/

solardiz commented 5 years ago

Our CI machine has AVX-512, really? Because it's a VM on some recent Xeon Scalable, perhaps?

claudioandre-br commented 5 years ago

Our CI machine has AVX-512, really?

Yes

Because it's a VM on some recent Xeon Scalable, perhaps?

The implementation may vary. Some Xeon, for sure. I really can't remember the model. Maybe not all machines in the pool are AVX-512 (anyway, the "we are moving to newer machines ..." message is less than 6 months old).

magnumripper commented 5 years ago

/me needs sleep, or even hybernation, now. C U tomorrow. Unless someone points me to something more important I'll continue with #3620 then.

claudioandre-br commented 5 years ago

Our CI machine has AVX-512, really?

The correct answer is: one of our CI providers has. The CPU is:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 85
model name  : Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
stepping    : 4
microcode   : 0x200005a
cpu MHz     : 3000.000
cache size  : 25344 KB
physical id : 0
siblings    : 36
core id     : 0
cpu cores   : 18
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips    : 6000.00
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

solardiz commented 5 years ago

@magnumripper Since you closed #3688 without merging, I expect you'll create a replacement PR for it soon. And yes, #3620 is also important. Thank you!

magnumripper commented 5 years ago

I merged #3688 manually, in 843638ba6. I didn't do it in the web gui because there were newer commits in the main tree so I wanted to see there were no new conflicts.

claudioandre-br commented 5 years ago

Hey, if you john-users guys are thinking about creating a testing team or a bug squashing day, or just have some idle hardware laying around, I wish you run some tests (a test=0 or test-full=0) on:

Windows
- 32bit + OpenCL
- 64bit + OpenCL
Linux x86_64
- Updated Fedora (flatpak)
- Ubuntu 16 LTS + OpenCL (snap app)
- Ubuntu 18 LTS + OpenCL (snap app)

solardiz commented 5 years ago

For the upcoming release announcement, I'd appreciate help on preparing a release highlights list. Way too much has changed since 1.8.0-jumbo-1 for us to create and post a changelog, but we do need to identify major user-visible changes and list those. So please post in here your lists of things to mention, and we'll merge those into a single release highlights list. Thanks!

solardiz commented 5 years ago

For the release number, I'll probably go with a 1.9.0 core release and a corresponding 1.9.0-jumbo-1. Sounds appropriate?

Fist0urs commented 5 years ago

Mmmh maybe before the release should I remove my tool "kerberom" from the run/ folder as it is now deprecated? (cf. https://github.com/magnumripper/JohnTheRipper/issues/2809#issuecomment-468924629). As stated in the aformentioned thread, I could (this evening) add a "HOWTO" kerberos TGS to explain how to retrieve tickets and tools which permit so (as I don't maintain kerberom anymore), if you are ok with this?

I'll implement Kerberos etype 17 and etype 18 later, after the release.

magnumripper commented 5 years ago

@Fist0urs that would be great, thanks!

magnumripper commented 5 years ago

For the upcoming release announcement, I'd appreciate help on preparing a release highlights list. Way too much has changed since 1.8.0-jumbo-1 for us to create and post a changelog, but we do need to identify major user-visible changes and list those. So please post in here your lists of things to mention, and we'll merge those into a single release highlights list. Thanks!

I'll update the files generated by git, for a starter.

For the release number, I'll probably go with a 1.9.0 core release and a corresponding 1.9.0-jumbo-1. Sounds appropriate?

Yes, I think so.

AlbertVeli commented 5 years ago

Hi, I tried the binaries from: https://rebrand.ly/JtRWin64 on a dual-boot iMac with Windows10, Device 1: Ellesmere [Radeon Pro 570]. First, there is no john-opencl.exe in the run/ directory. But the regular john.exe seems to handle OpenCL. The output of john.exe --test=0 is attached.

TLDR;

...
Testing: TrueCrypt-opencl [RIPEMD160 AES256_XTS OpenCL]... FAILED (cmp_all(1))
...
1 out of 500 tests have FAILED

johntest0.txt

And the output of --list=opencl-devices: opencl-devices.txt

claudioandre-br commented 5 years ago

Hi, I tried the binaries from: https://rebrand.ly/JtRWin64

Thanks. I need to replace the image referencing john-opencl.exe.

RIPEMD160 OpenCL is a knows problem https://github.com/magnumripper/JohnTheRipper/issues/3610. Thanks for your time.

BTW: I have this random notes in readme. Anyone?

sse4.1 detection has problems on 32 bits; (FIXED) fallback chain is AVX2 -> XOP -> AVX -> SSE4.1 -> SSE2 (is SSE4.1 needed at all?); only one binary to run CPU and OpenCL formats (anyone needs different binaries?).

claudioandre-br commented 5 years ago

Last minutes

we are not going to clear J2 issues list https://github.com/magnumripper/JohnTheRipper/milestone/2
even so, we can to deliver with confidence (we have many tests nowadays)

General news:

add lot of automated testing;

OpenCL news:

improved error handling
add pocl as a valid OpenCL target
sort GPUs by their "computation power"
use 1-based OpenCL devices
share code between CPU and OpenCL formats.
tuned OpenCL formats to newer boards
add mask mode to (almost) all fast OpenCL formats.

solardiz commented 5 years ago

fallback chain is AVX2 -> XOP -> AVX -> SSE4.1 -> SSE2 (is SSE4.1 needed at all?)

Maybe SSSE3 is a more important step (it's available on Core 2 and provides PSHUFB, which can be used for some rotate counts). I don't recall how much use of SSE4.1 we make over SSSE3. But it's fine to have that. Some recent Intel Atom CPUs lack AVX, but have SSE4.1, and indeed there are older non-Atom CPUs with only SSE* too.

solardiz commented 5 years ago

* we are not going to clear J2 issues list https://github.com/magnumripper/JohnTheRipper/milestone/2

@magnumripper Should we possibly create a new milestone for whatever stuff was marked J2 but didn't fit? I'd move some of the J2 issues to there, so that we'd know to focus on the remaining few now.

claudioandre-br commented 5 years ago

Should we possibly create a new milestone for whatever stuff was marked J2 but didn't fit?

You can do it. I did at https://github.com/magnumripper/JohnTheRipper/milestone/6

AlbertVeli commented 5 years ago

@claudioandre-br I ran --test=0 on Linux 64-bit intel with nVidia OpenCL; All 500 formats passed self-tests! Strange that it works on Linux but not on Windows. I built the Linux version from current source. Last commit: a073eeb81 Didn't try snap or flatpak because I'm on Gentoo and it's a mess to enable snap/flatpak.

claudioandre-br commented 5 years ago

Should we possibly create a new milestone for whatever stuff was marked J2 but didn't fit?

You can do it. I did at https://github.com/magnumripper/JohnTheRipper/milestone/6

We have now:

1.9.0-jumbo-1 Due by March 31, 2019
Planned release (1.9.0-jumbo-2) No due date
and 1.8.0-post-jumbo-2 for stuff that was marked J2 but didn't fit

claudioandre-br commented 5 years ago

Strange that it works on Linux but not on Windows.

You are, in fact, comparing AMD versus NVIDIA.

claudioandre-br commented 5 years ago

Solar, I changed the Wiki https://openwall.info/wiki/john/custom-builds?do=diff&rev2%5B0%5D=1537129862&rev2%5B1%5D=&difftype=sidebyside

You can keep a link to the stable 1.9J1 Jumbo package if you wish.

in fact, not exactly a link;
you need to download the files from AppVeyor and upload them to the Wiki.
AppVeyor packages will disappear in six months.
I can't do it myself (since there is size limit for file upload).

solardiz commented 5 years ago

@claudioandre-br I'm confused about what exactly you suggest I do and when. Also, judging by your edits, do you suggest we keep 4-component release numbers like 1.9.0.0? I thought of going with simply 1.9.0 for the release. Whether we'll have a 4th component there later or not is not decided yet - maybe for development snapshots only. Our releases have used 3-component version numbers so far.

solardiz commented 5 years ago

My guess is you suggest I download builds of 1.9.0-jumbo-1, once we have it, from AppVeyor and upload to the wiki or to download.openwall.net using my admin powers. Then we'll keep links to that. Right?

claudioandre-br commented 5 years ago

I'm confused about what exactly you suggest I do

Download the files from AppVeyor and upload them to the Wiki.
You/someone else created the Wiki entry, I would like to keep it updated.

and when

when you can, or when magnum calls something "final" version.

do you suggest we keep 4-component release numbers like 1.9.0.0?

It is a TODO reminder, I don't have the final version full number.
I reverted it adding the note that Wiki links to a development version.

magnumripper commented 5 years ago

I'm still stumped by #3712 and possibly an other unrelated bug. Both are blockers, or they may be the one same bug. At least one of them were caused by 6ba42b308 but reverting that is certainly not an option. I'm sure it's very simple, like a one-line fix - the trick is to find it 🤣

Unfortunately I can't do a lot more for ~12-14 hours. Hopefully I'll wake up tomorrow and just know exactly what is wrong (that has happened many times before, I have some bug/feature in my low-power sleep states).

solardiz commented 5 years ago

Previously, I wrote:

Then I'll target "by March 30" or failing that "April 7 or April 8", also considering my own availability.

I am now targeting "April 7 or April 8" for the release. We should try to be almost ready before then.

Are we done with MKPC and OMP_SCALE re-tuning or do we still have performance regressions compared to 1.8.0-jumbo-1? I wish we could just run relbench, but for that we need a way to revert to old-style benchmarks.

Please continue to nominate items for release highlights. For example, I know there's magnum's work on character encodings support, but I don't know much about it, so I'd appreciate magnum providing the proper wording, etc. Thanks!

solardiz commented 5 years ago

I've just cleaned up the issue list for the 1.9.0-jumbo-1 milestone, moving many issues to the "didn't fit" milestone that Claudio created. And I moved one issue under the 1.9.0-jumbo-1 milestone. There are 17 issues left (not counting this general "Jumbo 2" issue). Not all of them are strictly required to be worked on before the release, but all seem worthy and reasonable of being looked into at this time.

I consider these most important (almost blockers): #3249 #3697 #3265 #2914 #2738 #2871. I'd appreciate the community's work on those.

I'd also like to replace my old unreleased escrypt that got stuck in this tree with yescrypt, as part of work on issue #2871.

magnumripper commented 5 years ago

Hopefully I fixed the 32-bit CPU detection issue now (cacd3c9eb). I will concentrate on #3249/#3697 from now on

magnumripper commented 5 years ago

Time is running out though. Hopefully I can work an hour or two a day on 1:st-3:rd April. And perhaps a little tomorrow, I will have to sleep soon.

magnumripper commented 5 years ago

Are we done with MKPC and OMP_SCALE re-tuning

No. #3091 should ideally be fixed & closed. It's very trivial, just a LOT of formats to review...

magnumripper commented 5 years ago

Please continue to nominate items for release highlights. For example, I know there's magnum's work on character encodings support, but I don't know much about it

I had no idea so I checked it: Apparently the gist of it is all OpenCL formats with GPU-side mask support now handle Unicode correctly, and the option name -internal-encoding changed to -internal-codepage (with similar change for the corresponding john.conf label), for clarity (UTF-8 is an encoding but not a codepage), although the old names still work (we'll drop them after the release).

New cracking mode "subsets" - very similar to the older external mode but much faster, supports resume and has full Unicode support (it actually uses UTF-32 internally so has no "internal codepage" limitations).
Nearly all OpenCL formats that had CPU post-processing now do everything on GPU. Sometimes with an friggin' awesome boost.

There is probably much more but I totally lost track of it, I can't even remember Jumbo-1. A million lines of source code was added or changed ("1994 files changed, 827557 insertions(+), 201694 deletions(-)") in well over 7,000 commits. I'll add highlights once they appear to me.

magnumripper commented 5 years ago

Hmm since Jumbo-1 we apparently not only added support for AVX-512 but also for AVX2 and even AVX!? And also NEON32/64 and other stuff. We (as in royal we) added an abstraction layer for SIMD that has proven very effective and clean, I think I got inspired from something in john proper's DES stuff.

solardiz commented 5 years ago

@magnumripper I'm afraid I won't be able to document the encoding-related enhancements correctly, as I don't use them... because I don't really understand what they are nor how to use them. So I'd appreciate you providing specific wordings for each one of them that you think should be in the release highlights.

We did have up to AVX and XOP in the previous release, at least for DES. I think AVX2, AVX-512, MIC, NEON are new since the previous release. I think we also started using AltiVec for more than just DES after the previous release. Yes, jumbo's pseudo-intrinsics are a similar concept to core's v* macros in DES_bs_b.c. (Maybe we should switch the DES code to those same pseudo-intrinsics further down the road. Maybe when I stop maintaining core separately.)

I've just added #3091 #1972 to the 1.9.0-jumbo-1 milestone. I think both of these are almost blockers.

solardiz commented 5 years ago

* Nearly all OpenCL formats that had CPU post-processing now do everything on GPU.

Two examples to the contrary: rar-opencl and tezos-opencl. Perhaps there are more. It'd be useful to have a list of them on some GitHub issue.

solardiz commented 5 years ago

Are we testing with --enable-openmp-for-fast-formats anywhere? When did we last test this, and on what platforms?

When did we last build with --enable-fuzz and run the builtin fuzzer, and on what platforms?

Edit: perhaps should include --enable-asan and maybe --enable-ubsan in those test builds.

claudioandre-br commented 5 years ago

When did we last build with --enable-fuzz and run the builtin fuzzer, and on what platforms?

Today, Ubuntu 17.10 on Intel Xeon https://travis-ci.org/claudioandre-br/JohnTheRipper/jobs/513761719#L998

But, since I target OpenCL, only a few formats are stressed. => Test OpenCL is not easy inside CI.

magnumripper commented 5 years ago

Two examples to the contrary: rar-opencl and tezos-opencl. Perhaps there are more. It'd be useful to have a list of them on some GitHub issue.

3242 was up-to-date at the time it was open (I think) and #3216 is related

solardiz commented 5 years ago

@frank-dittrich On your Fedora 29, can you please check if --format=crypt handles scrypt and yescrypt hashes OK? Those are supposed to be supported in Fedora 29+ (or 28+ with updates) in libxcrypt, which I think has replaced glibc's libcrypt in that distro. Thanks!

frank-dittrich commented 5 years ago

@solardiz

Default hash algorithm is sha512crypt.

I tested 9 out of 10 scrypt hashes, skipping the one with linefeed in plaintext.

All were cracked both by --format=crypt and --format=scrypt.

$ mkpasswd --method=yescrypt
Passwort: 
$y$j9T$U8S08m4uUdwRxdPpoC5aO/$2iT3AcXFK1j6f52IXY/cJV47pAAzc/dAPMV2SCoa731

(I used 123456 as the password.)

$ ./john yescrypt.hash --wordlist=yescrypt.pw --format=crypt
Using default input encoding: UTF-8
Loaded 1 password hash (crypt, generic crypt(3) [?/64])
Cost 1 (algorithm [1:descrypt 2:md5crypt 3:sunmd5 4:bcrypt 5:sha256crypt 6:sha512crypt]) is 0 for all loaded hashes
Cost 2 (algorithm specific iterations) is 1 for all loaded hashes
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
Warning: Only 1 candidate left, minimum 96 needed for performance.
123456           (test)
1g 0:00:00:00 DONE (2019-04-03 15:29) 50.00g/s 50.00p/s 50.00c/s 50.00C/s 123456
Use the "--show" option to display all of the cracked passwords reliably
Session completed

Еше что-нибудь проверить?

solardiz commented 5 years ago

@frank-dittrich Thanks! I think no need to check anything else, but we might want to list "0:unknown" as a possibility for "Cost 1".

openwall / john

1.9.0-jumbo-1 #3513

3242 was up-to-date at the time it was open (I think) and #3216 is related