1.9.0-jumbo-1 - Githubissues

magnumripper commented 5 years ago

Lets's list / discuss here what we need to do before a release. Or rather, a list of NEED, one of NICE and one of DON'Ts.

See also #1879

magnumripper commented 5 years ago

Hey @jfoug we miss you! So much have happened since last time you were here, you should have a look! 😉

claudioandre-br commented 5 years ago

Users (such as @blshkv, see #2355) should upvote the issue to put pressure on Solar.

BTW, users, please, avoid to add comments like "me too".

solardiz commented 5 years ago

I think we need to re-do #2914 before any release, and for that we need consistent benchmarks or at least ability to temporarily re-enable the kind of benchmarking we had as of previous release (even if it was slightly wrong), as we determined in discussion on #3289. We also need to complete tuning of MKPC and OMP_SCALE before and as a result of the relbench runs.

solardiz commented 5 years ago

Once we're in formats freeze (no "unnecessary" changes to existing formats), we'll also need to run the built-in --test-full=1 and --fuzz on different kinds of machines. And at least try to actually fix bugs that this might uncover. This is good to do before the freeze as well, so that we'll have less fixing to do during the freeze.

This should be part of our release process all the time.

In fact, one of our tasks may be to prepare a reusable pre-release task list / checklist.

claudioandre-br commented 5 years ago

--test-full=0 is run frequently;
--fuzz is very slow (I only fuzz "my" formats)

Something fails in --test-full=1 but it is a "can't fix" situation (by memory)

claudioandre-br commented 5 years ago

There are some patches I would like discussed/accepted upstream. The first one:

diff --git a/src/idle.c b/src/idle.c
index a17d1b2d8..0f9651bf9 100644
--- a/src/idle.c
+++ b/src/idle.c
@@ -93,7 +93,7 @@ void idle_init(struct fmt_main *format)
  * started with a non-negative nice value (so no need to increment it by more
  * than 20).
  */
-   if (nice(20) == -1)
+   if (nice(5) == -1)
        perror("nice");
 #endif

diff --git a/src/john_mpi.c b/src/john_mpi.c
index 8420fb74c..314016561 100644
--- a/src/john_mpi.c
+++ b/src/john_mpi.c
@@ -29,7 +29,7 @@ void mpi_teardown(void)

    if (mpi_p > 1) {
        /* Some MPI platforms hang on 100% CPU while waiting */
-       if (nice(20) == -1)
+       if (nice(5) == -1)
            perror("nice");
        MPI_Barrier(MPI_COMM_WORLD);
    }

seccomp does not accept the invalid value (20);
it used to crash on Ubuntu. After the flood of complaints, it now prints a system error message;
the routine is called more than once, right? So, I avoid the overflow (but I need to test again);

solardiz commented 5 years ago

Per https://github.com/magnumripper/JohnTheRipper/issues/2005#issuecomment-447601505 I think we should very explicitly state in the documentation, in some place like a text file called INSECURITY, that JtR isn't to be used on untrusted input and should be run with the lowest privileges necessary.

solardiz commented 5 years ago

Thank you for reporting the nice issue to me, Claudio! I was aware that 20 is indeed beyond the officially supported range. The code uses it because it actually worked, and resulted in a slightly lower priority than 19, on Linux/Alpha back then (but not on Linux/x86, where it was effectively capped to 19). While nice is specified to cap this value to the maximum, if there's a common seccomp policy not allowing that then indeed we should make adjustments now.

However, I dislike setting it as low as 5. Can we use 19, in both jumbo and core? I think there shouldn't be any issue from multiple attempts at a nice(19), especially as we don't know the initial nice value of the process anyway.

claudioandre-br commented 5 years ago

Per #2005 (comment) I think we should very explicitly state in the documentation, in some place like a text file called INSECURITY, that JtR isn't to be used on untrusted input and should be run with the lowest privileges necessary.

This applies to a few formats. We should PUNISH these formats ONLY.

On the other hand, after the "punishment" we could get more resources to fix them

I do punish some formats (I remove them before build JtR for CI). I don't remove them on packages because I call what I do JtR, otherwise, I would.

solardiz commented 5 years ago

JtR jumbo being unsuitable for handling of untrusted input is a consequence of how many formats and different parsers we have. It is unrealistic to have zero bugs there. We should document that.

Separately, we should also work, resources permitting, on making the code more robust - such as through more consistent implementations of valid that would invoke shared sanity-checkers. We already have those for numbers embedded in hash encodings. We could also have e.g. regexp-alikes covering the whole strings, where a match is mandatory before we do any further parsing. But this isn't a release preparation task.

claudioandre-br commented 5 years ago

There are some patches I would like discussed/accepted upstream. The second one:

I reverted to regex 1.4 while packaging

I found no release for 2.0+ (that works);
some open issues frighten me

magnumripper commented 5 years ago

I agree about that INSECURITY file. It can be completely general about the whole package, no real need to single out any specific format. This is all best-effort (or worse).

MPI could be singled out, although the problem is mainly with MPI itself and not with JtR. I already added warnings to our docs.

magnumripper commented 5 years ago

There are some patches I would like discussed/accepted upstream. The second one:

I reverted to regex 1.4 while packaging

I found no release for 2.0+ (that works);

some open issues frighten me

You mean Rexgen issues?

I'm thinking we might want to ditch rexgen support (at least #ifdef it out as #if USE_EXPERIMENTAL) because there's too many problems with the library.

claudioandre-br commented 5 years ago

There are some patches I would like discussed/accepted upstream. The last one is:

From 2892057c0797eb2cdfa751cdc2e0ccf96a4c5025 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Claudio=20Andr=C3=A9?= <claudioandre.br@gmail.com>
Date: Thu, 20 Apr 2017 22:36:12 -0300
Subject: [PATCH 2/2] maintenance: fix the expected data type size

---
 src/memdbg.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/memdbg.c b/src/memdbg.c
index f7551f5e1..2cadca801 100644
--- a/src/memdbg.c
+++ b/src/memdbg.c
@@ -545,7 +545,8 @@ void * MEMDBG_calloc(size_t count, size_t size, char *file, int line)
  */
 void * MEMDBG_alloc(size_t size, char *file, int line)
 {
-   MEMDBG_HDR      *p, *p2;
+   MEMDBG_HDR      *p;
+   void *p2;

    if ( ((signed long long)mem_size) < 0)
        fprintf(stderr, "MEMDBG_alloc "LLd" %s:%d  mem:"LLd"\n", (unsigned long long)size, file, line, (unsigned long long)mem_size);
@@ -614,7 +615,8 @@ void * MEMDBG_alloc(size_t size, char *file, int line)
  */
 void * MEMDBG_alloc_align(size_t size, int align, char *file, int line)
 {
-   MEMDBG_HDR      *p, *p2;
+   MEMDBG_HDR      *p;
+   void *p2;
    char *p3;

    if ( ((signed long long)mem_size) < 0)
-- 
2.11.0

claudioandre-br commented 5 years ago

I'm thinking we might want to ditch rexgen support (at least #ifdef it out as #if USE_EXPERIMENTAL) because there's too many problems with the library.

Ok, let me know. Move it to Experimental code ........ yes/no is a good solution.

magnumripper commented 5 years ago

Unfortunately we might end up defining out the dynamic ad-hoc (compiler) format as experimental/opt-in as well. That would be a real shame, it's a fantastic format. But it needs some love and Jim seems to enjoy his retirement too much...

magnumripper commented 5 years ago

The last one is: (...)

- MEMDBG_HDR      *p, *p2;
+ MEMDBG_HDR      *p;
+ void *p2;

#define CLIENT_2_HDR_PTR(a) ((MEMDBG_HDR *) (((char *) ((ARCH_WORD)(((char *)a)-4-sizeof(MEMDBG_HDR*)) & ~0xF))))

    p2 = CLIENT_2_HDR_PTR(p3);
    memcpy(p2, &p, sizeof(p));

Not sure I understand why it should be changed to void* but OTOH I see no reason it wouldn't be fine with your change. What was the reason for this? Warnings?

magnumripper commented 5 years ago

As for --test-full=1 it works fine on my laptop and I think I recall some of the bots does such test as well.

claudioandre-br commented 5 years ago

It is a warning: p versus &p

memcpy(p2, p, sizeof(p)); versus memcpy(p2, &p, sizeof(p));

claudioandre-br commented 5 years ago

As for --test-full=1 [...] I think I recall some of the bots does such test as well.

I found only --test-test=0

claudioandre-br commented 5 years ago

However, I dislike setting it as low as 5. Can we use 19, in both jumbo and core? I think there shouldn't be any issue from multiple attempts at a nice(19), especially as we don't know the initial nice value of the process anyway.

Testing now. I can't trigger an error using 19. I will re-test 20. Not far in the past, I was able to crash a test program like this:

for (int i=0; i < 25; i++)
   nice(1);

solardiz commented 5 years ago

We definitely need to mention MPI's risk of having a listening port in the INSECURITY file.

I've never used Rexgen, and I don't know what advantages it possibly still has over our mask mode now. I'm OK with excluding Rexgen by default.

"Unfortunately we might end up defining out the dynamic ad-hoc (compiler) format as experimental/opt-in as well. That would be a real shame" - definitely a shame, and let's keep this as default-enabled functionality at least in the very next release. I think we can keep it mostly working at least for one release.

I'm surprised to see memdbg discussed on this issue. As far as I'm aware, this was Jim's custom code for jumbo, and it has no upstream. It's not in my JtR tree.

Great news re: running --test-full. I was thinking that we probably don't run it as often, or at all, against non-CPU formats. @Apingis and I ran --test-full=1 against ZTEX formats a few months ago for the first time, and this uncovered issues, which @Apingis then fixed. We probably need to be re-doing that manually once in a while, and for OpenCL too.

Regarding nice, that loop example suggests there's something wrong with the system where this crashes, and not so much in our code. I doubt we want to adapt to that any more than by reverting to nice(19).

jfoug commented 5 years ago

Hey @jfoug we miss you! So much have happened since last time you were here, you should have a look! 😉

Ok, i'll take the bait ;) But my 'retirement' got old. I bought a biz, and have been working my arse off a lot. Life is just so busy, but I will put some time in here.

claudioandre-br commented 5 years ago

Regarding nice, that loop example suggests there's something wrong with the system where this crashes, and not so much in our code. I doubt we want to adapt to that any more than by reverting to nice(19).

Nice (setpriority) was huge a problem. https://forum.snapcraft.io/t/setpriority-bad-system-call/1157

How it is working now (batch mode - 3 idle_init() calls):

nice(5) -> no error message.
nice(20) -> nice: Operation not permitted printed 3 times. Nice value does not change
nice(19) -> nice: Operation not permitted printed twice.

magnumripper commented 5 years ago

Welcome back @jfoug! The only new thing (I think) is you are now required to make a PR as opposed to push directly (everybody is, well except me but I should mostly do so too).

claudioandre-br commented 5 years ago

I updated the nice post above. Please, read it again. An example is:

$ OMP_NUM_THREADS=1 john-the-ripper -form=raw-SHA512 ~/alltests.in
Using default input encoding: UTF-8
Loaded 35 password hashes with no different salts (Raw-SHA512 [SHA512 128/128 SSE2 2x])
Remaining 27 password hashes with no different salts
Proceeding with single, rules:Wordlist
Press 'q' or Ctrl-C to abort, almost any other key for status
Almost done: Processing the remaining buffered candidate passwords, if any
Proceeding with wordlist:/snap/john-the-ripper/x1/run/password.lst, rules:Wordlist
nice: Operation not permitted
Proceeding with incremental:ASCII
nice: Operation not permitted

solardiz commented 5 years ago

Thanks, @claudioandre-br! This is known to affect only Ubuntu Snaps, right?

I think we should use nice(19) and maybe suppress the repeated calls (in idle_init, set/check a flag that we're already initialized).

nice(5) and allowing the repeated calls is clearly wrong because it results in different priority across the 3 passes of batch mode, which isn't intended nor expected.

claudioandre-br commented 5 years ago

This is known to affect only Ubuntu Snaps, right?

Yes.

jfoug commented 5 years ago

I really wonder if the mem_dbg stuff should simply be removed? It was done a LONG time ago (pre bounds checker days), if anyone knows about that tool. When inserted into John, it was before any of the ASAN additions. With ASAN, is this code really needed, or is it just a hindrance?

Note, it is not a small task to remove it (since the include is in ALL files). But the real 'hooks' into john is just the include (and associated .c files). Stripping those out should not be too difficult to do. NOTE, removing mem_dbg should speed up things like ASAN (if someone was building with ASAN and mem-dbg together. The memdbg does increase the allocation chain quite a bit.

magnumripper commented 5 years ago

There are times when memdbg is the easiest way to find a memory leak but it doesn't happen very often. Dropping the #include from all files is easy. Something like git grep -l '#include "memdbg.h"' | xargs perl -pi -e 'BEGIN { $\ = undef; } s/#include\s+"memdbg.h"\s*\n//mg'

magnumripper commented 5 years ago

The build bots are sometimes extremely slow. I now turned off the requirement for them to finish before merging - we should still wait for them when possible and applicable (eg. there's no need to wait for the bots after a doc change!).

jfoug commented 5 years ago

There are times when memdbg

Are you saying you want it left? I can certainly leave it in, and 'yes', finding mem leaks was why the tool was why I originally wrote it (about 25 years ago). I updated it a couple times over the next few years (adding things like over/under flow catching, etc). But it really has not been touched for about 20 years, except when I put it into jtr, I had to dumb it down from the nice C++ class it was in, to silly C ;) It really sux as C But if you think it has benefit, I can certainly leave it in. I do think there are much better ways (now) to find memory leaks.

jfoug commented 5 years ago

@magnumripper I do think there is some problem for me and circle ci. I may have gotten things connected again. I went to my account (in circle), went to the organization, the VCS and turned on 'Manage Github Checks'. That was not set before. That and I also attached some new keys. I am hopeful that it simply works now.

The travis would simply kick off, whenever I did a PR. But circle just sat and hung, giving no indication of working at all :(

magnumripper commented 5 years ago

Feel free to remove memdbg if you'd like to. It's sometimes pretty hard to follow the non-memdbg code path when we have macros that point to functions that call other functions 😵 And yes, there are very powerful external tools now.

jfoug commented 5 years ago

I have it about removed. There was one 'feature' I had forgotten about. I wonder if it may be worthwhile keeping for that feature.

That is the ability to store a memory snapshot at runtime, and then later check to make sure the snapshot is the same. We actually USE this capability. We set a snapshot before running a test, and then compare after a test (check out bench.c) It is the call to MEMDBG_getSnapshot(0); and then later to MEMDBG_checkSnapshot_possible_exit_on_error(memHand, 0);

Now, I am not sure keeping memdbg, with all of the obfuscation it carries (in C form), is helpful, just for this simple behavior. But if we want to keep it, we can, and I can simply scrap my work.

magnumripper commented 5 years ago

So are we changing to nice(19)? I just don't want it forgotten. If/when we do, there's an instance in john_mpi.c as well.

solardiz commented 5 years ago

Yes, I intend to change to nice(19) and to add logic to avoid repeated calls in batch mode in core, and I've added that to my to-do list, to be taken care of along with other minor changes. Jumbo doesn't have to wait on me.

magnumripper commented 5 years ago

Done in b2a419d78 and 68085b8eed

jfoug commented 5 years ago

Have we made any determination on librexgen? Should it simply be moved over to unused (or removed totally)?

magnumripper commented 5 years ago

In #3385 @teeshop said he will look at it. And as long as it works (although with memory leak) we can keep it - we might just add some "experimental" tag to configure output and docs.

janstarke commented 5 years ago

I've added some fuzzing capability and an address sanitizer.

Take a look at feature/libFuzzer ...

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ Am Freitag, 18. Januar 2019 07:41 schrieb magnum notifications@github.com:

@teeshop said he will look at it. And as long as it works (although with memory leak) we can keep it - we might just add some "experimental" tag to configure output and docs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

gituser commented 5 years ago

Any ETA on the release?

Thank you for all the work, is there a donation address?

solardiz commented 5 years ago

@gituser No real ETA, but more like a moving "maybe in a month from now". ;-(

Also, no donation address currently. We had a "donations" we page years ago, but most donations were not genuine and the donors wanted links to irrelevant websites. When we introduced a requirement that the linked-to websites be at least slightly relevant (as in "also in IT"), donations almost stopped. So we eventually removed that web page altogether, and stopped accepting donations, except for tiny ones made along with purchase of Openwall and JtR t-shirts, and we use those only to send free t-shirts to project contributors. (BTW, this reminds me - we haven't done that in a long while - maybe it's time to ask what balance we have there and identify which recent contributors to send free t-shirts to.) https://www.zerodayclothing.com/openwall_store.php

Now that cryptocurrencies have significant value and some people "got rich", maybe accepting donations would make sense again. If you don't mind, how much were you thinking of donating? Personal or business funds? If we'd find corporate sponsors (who wouldn't impose unacceptable conditions), that would help. Please feel free to e-mail me privately (solar at openwall.com) if you prefer, as well as because this is off-topic for this issue.

gituser commented 5 years ago

Now that cryptocurrencies have significant value and some people "got rich", maybe accepting donations would make sense again. If you don't mind, how much were you thinking of donating? Personal or business funds? If we'd find corporate sponsors (who wouldn't impose unacceptable conditions), that would help. Please feel free to e-mail me privately (solar at openwall.com) if you prefer, as well as because this is off-topic for this issue.

Personal.
Bitcoin / Ethereum or any other cryptocurrency method would be nice
10$ or more
No obligations for you from my donation

Sorry for the offtopic. Looking forward for the new release!

Thanks.

solardiz commented 5 years ago

I'd like to make the release around April 1st or so. (And if this exact date, the joke would be us not making releases for years yet finally making one.) I am serious about this.

@magnumripper Please either hold off on your proposed FMT_BLOB addition or make it in a branch, for it to be post-release stuff. Thanks!

magnumripper commented 5 years ago

Cool. I'll be of as much help I can but I will have very little time from March 31 and a few days on

solardiz commented 5 years ago

Then I'll target "by March 30" or failing that "April 7 or April 8", also considering my own availability.

solardiz commented 5 years ago

For the nice issue, I'm trying a different fix in core:

+++ Owl/packages/john/john/src/idle.c   Thu Mar 21 21:25:02 2019
@@ -11,6 +11,7 @@
 #define _XOPEN_SOURCE /* for nice(2) */
 #include <unistd.h>
 #include <stdio.h>
+#include <errno.h>

 #ifdef _POSIX_PRIORITY_SCHEDULING
 #include <sched.h>
@@ -54,6 +55,9 @@

 void idle_init(struct fmt_main *format)
 {
+#ifndef __BEOS__
+       int old_nice;
+#endif
 #if defined(_POSIX_PRIORITY_SCHEDULING) && defined(SCHED_IDLE)
        struct sched_param param = {0};
 #endif
@@ -64,14 +68,15 @@
        clk_tck_init();

 #ifndef __BEOS__
-/*
- * Normally, the range is -20 to 19, but some systems can do 20 as well (at
- * least some versions of Linux on Alpha), so we try 20.  We assume that we're
- * started with a non-negative nice value (so no need to increment it by more
- * than 20).
- */
-       if (nice(20) == -1)
+       errno = 0;
+       old_nice = nice(0);
+       if (old_nice == -1 && errno) {
                perror("nice");
+       } else {
+               errno = 0;
+               if (nice(19 - old_nice) == -1 && errno)
+                       perror("nice");
+       }
 #else
        set_thread_priority(getpid(), 1);
 #endif

The errno hack to distinguish the -1 returns is officially suggested by POSIX.

solardiz commented 5 years ago

@magnumripper I've made many minor changes to core today - please merge those into jumbo. Then there will likely be another batch. Thanks!

solardiz commented 5 years ago

@magnumripper Thank you for your work merging these changes! I'll comment on that PR separately.

Meanwhile, I've pushed another batch of core changes for you to merge - this time, adding AVX2 and AVX-512 support to core, and dropping some related legacy stuff not to over-complicate the resulting compile-time logic. While at it, I noticed that jumbo's AVX-512 support for bitslice DES lacked use of _mm512_ternarylogic_epi32. In the implementation now in core I've added limited use of that intrinsic, through vsel. This appears to have improved the 256x OMP speed on a KNL from ~156M in jumbo to ~189M in core.

After you cvsimport this, I might add AVX2 and AVX-512 runtime CPU detection perhaps based on your code from jumbo, and/or I might revise the use of _mm512_ternarylogic_epi32 to be more complete, importing the extra S-box expressions file that we currently only have in jumbo. I'd expect to exceed 200M on that KNL with that change.

openwall / john

1.9.0-jumbo-1 #3513