more control over single mode: need options to disable word extraction, to disable pairs and/or to feed exact candidates

AlekseyCherepanov commented 5 years ago

CMIYC 2019 created interest to feed candidates into single mode with high precision because the hashes are very slow (real world example: Ashley Madison leak). E.g. it may be desirable to use only specified candidates without any additional candidates.

I'll mix several problems, because probable goal may be reached not fixing all of them.

I did not find any option to disable word extraction fully.
I did not find way to disable generation of pairs, because SingleWordsPairMax is raised automatically to cover some KPC (I am not sure which one).
- Also final number of pairs may exceed max. key per crypt twice or more (#4110).

Example, current state:

$ printf '.include <john.conf>\n[Local:Options]\nSingleSkipLogin = Y\nPristineGecos = Y\n' > t.conf

$ cat t.conf
.include <john.conf>
[Local:Options]
SingleSkipLogin = Y
PristineGecos = Y

$ echo 'abc,123:$0$h:::qwe,567' > plaintext.pw

$ ./JohnTheRipper/run/john plaintext.pw --verbosity=6 --single=None --config=t.conf
[...]
set_key(qwe, 0)
set_key(qwe567, 1)
set_key(q567, 2)
set_key(qweqwe,567, 3)
set_key(qqwe,567, 4)
set_key(567, 5)
set_key(qwe,567, 6)
set_key(qwe,567qwe, 7)
Almost done: Processing the remaining buffered candidate passwords, if any.
set_key(qqwe, 0)
set_key(qwe,567567, 1)
[...]

I would like to be able to try only qwe,567.

Optionally, with 'SingleSkipLogin = N', it may be useful to be able to try only abc,123 and qwe,567 candidates (but not pairs with them).

john is quite fresh:

$ ./JohnTheRipper/run/john --list=build-info
Version: 1.9.0-jumbo-1+bleeding-ea33667eb 2019-09-23 12:43:34 +0200
Build: linux-gnu 64-bit x86_64 AVX2 AC OMP

AlekseyCherepanov commented 5 years ago

I've found a hack to pass only the full baseword: wrap basewords into '\x02' and check the ends using rules, then delete '\x02'.

AlekseyCherepanov commented 5 years ago

Well, a printable separator char for border would be ok too (e.g. --single=':=0# =m# Dm D0').

magnumripper commented 5 years ago

Not sure if there are other (faster) ways of achieving this already but otherwise we could add some option. Perhaps a new bool option SingleNoPairing defaulting to false? On the other hand, if our format has a KPC of 24 or 2097152 anyway, why not instead have an option that merely limits it according to the KPC? For plaintext format IIRC KPC is always 1 (so would be equivalent to SingleNoPairing = Y), but for eg. raw-md5 it can well be 24 or even 2097152 for OpenCL. Setting a lower limit is futile!

AlekseyCherepanov commented 5 years ago

$ ./JohnTheRipper/run/john --format=plaintext --list=format-all-details
[...]
Min. keys per crypt                  1
Max. keys per crypt                  130
[...]

But KPC reported in single mode is 8:

$ cat t.conf
.include <john.conf>
[Local:Options]
SingleWordsPairMax = 0

$ echo '$0$H' > t.pw

$ ./JohnTheRipper/run/john t.pw --log-stderr --config=t.conf --single=none
[...]
Loaded 1 password hash (plaintext, $0$ [n/a])
[...]
0:00:00:00 - SingleWordsPairMax increased to 3 for high KPC (8)
[...]

KPC reported in single mode is from key_count variable, it goes through multiple stages of tweaking in the code.

But there is a warning already that work below min. KPC is inefficient:

    if (key_count < single_db->format->params.min_keys_per_crypt) {
        if (john_main_process) {
            fprintf(stderr,
"Note: Performance for this format/device may be lower due to single mode\n"
"      constraints. Format wanted %d keys per crypt but was limited to %d.\n",
                    single_db->format->params.min_keys_per_crypt,
                    key_count);
        }
        log_event(
"- Min KPC decreased from %d to %d due to single mode constraints.",
            single_db->format->params.min_keys_per_crypt,
            key_count);
    }

I guess if a user wants such option and format's KPC is high, then it should be the user's problem to provide appropriate number of candidates, e.g. using rules. But I like useful and efficient defaults.

I am not sure, but it seems that the point where you would like to increase number of candidates is before application of rules, so it is the number of words for rules and not the number of final candidates. So at this point in code, it is not possible to know the final number easily because there may be rejects. Is it so? (With the option for limit, it might work this way: emit baseword, process rules and fill the buffer, if there is space in buffer, emit another baseword but drop excessive candidates.)

SingleNoPairing would not be enough, because there is word extraction that adds additional candidates too.

AlekseyCherepanov commented 5 years ago

django-scrypt has min. KPC 1, while single mode wants buffer of size 8:

$ ./JohnTheRipper/run/john --format=django-scrypt --list=format-all-details
[...]
Min. keys per crypt                  1
Max. keys per crypt                  1
[...]

[...]
0:00:00:00 - SingleWordsPairMax increased to 3 for high KPC (8)

Plaintext format shows that set_key() is called only for real candidates. There is no hashing-wise overhead from under-filled buffer in single mode, i.e. when we have more than min. KPC but less than key_count. Is this buffer needed to reduce overhead from set_salt() needed to switch between hashes? Is this overhead important for slow hashes?

I think precise control is needed for very slow hashes mostly. Not all slow formats have min. KPC 1: bcrypt has min. KPC 3 (and max. KPC 3).

magnumripper commented 5 years ago

I recall single mode have a floor of 8 for something... I need to revisit the code.

AlekseyCherepanov commented 5 years ago

So I tried 1 candidate per salt: p/s is 2 times lower than c/s for django-scrypt. I picked 40 lines and results match. I traced with gdb: set_key() is called correct number of times. Then what is c/s? How is it computed?

I tried to raise number of candidates using rules. Results:

| Cand. | time (s) | speed             |
|-------+----------+-------------------|
|     1 |    4.264 | 10.30p/s 20.10c/s |
|     2 |    6.070 | 13.79p/s 20.44c/s |
|     3 |    8.081 | 15.53p/s 20.53c/s |
|     4 |    9.905 | 16.66p/s 20.70c/s |
|     5 |   11.925 | 17.16p/s 20.49c/s |
|     6 |   13.755 | 17.78p/s 20.66c/s |
|     7 |   15.530 | 18.34p/s 20.88c/s |
|     8 |   17.606 | 18.44p/s 20.68c/s |
|     9 |   19.527 | 18.69p/s 20.71c/s |

single.conf (there is "^A" for '\x01' in real sj2.pw):

.include <john.conf>
[Local:Options]
SingleRetestGuessed = N
SingleWordsPairMax = 0

sj2.pw:

#cijoviro16#^Ascrypt$Q96YFTgDlIOj$15$8$1$64$IMYpKhIIHB14JzLfK8lNuTk2r0tVOWgwzFoaCQitmbAaFHIzg6G9FfwIBa7sPrBOVu0IUbBjBypJOYeLF3cWRQ==
#jojovisu84#^Ascrypt$nd7ylGyrMj33$15$8$1$64$dQGh4FA0g10+r5eMz5wXyUlZnUeKC/iocT4GMoiZzsnh2IvSun7SCijSpKNty3Dj0ta9cW7andKEmVWkHTP9Hg==

trace_django.gdb:

set pagination off

break main
run </dev/null --config=single.conf sj2.pw --field-separator-char='\x01' --single=':=0# =m# Dm D0'
del 1

break scrypt_set_key
break *fmt_django_scrypt.methods.set_salt
continue

while 1
  continue
end

Trace:

$ gdb -x trace_django.gdb --batch ./JohnTheRipper/run/john 2>&1 | grep set_

Breakpoint 3, set_salt (salt=0x55555732e6d4) at django_scrypt_fmt_plug.c:195
Breakpoint 2, scrypt_set_key (key=0x55555914c740 "notastrongpassword", index=0) at django_scrypt_fmt_plug.c:274

Breakpoint 3, set_salt (salt=0x55555732e6d4) at django_scrypt_fmt_plug.c:195
Breakpoint 2, scrypt_set_key (key=0x55555914c740 "realmenuseJtR", index=0) at django_scrypt_fmt_plug.c:274

Breakpoint 3, set_salt (salt=0x555559150000) at django_scrypt_fmt_plug.c:195
Breakpoint 2, scrypt_set_key (key=0x7fffffffb0d0 "cijoviro16", index=0) at django_scrypt_fmt_plug.c:274

Breakpoint 3, set_salt (salt=0x5555591501e0) at django_scrypt_fmt_plug.c:195
Breakpoint 2, scrypt_set_key (key=0x7fffffffb0d0 "jojovisu84", index=0) at django_scrypt_fmt_plug.c:274

Breakpoint 3, set_salt (salt=0x555559150000) at django_scrypt_fmt_plug.c:195
Breakpoint 3, set_salt (salt=0x5555591501e0) at django_scrypt_fmt_plug.c:195

magnumripper commented 5 years ago

Then what is c/s? How is it computed?

In any cracking mode but single (as you probably know), it's like this (edited several times):

c/s is our actual hashing rate and this figure is what should match a benchmark.
p/s is c/s divided by number of salts, so it reflects how many candidates per second we're doing. It's our actual progress in the keyspace.
C/s is like c/s but multipled with the same-salt factor that we started to show recently (or maybe I didn't commit that yet? I can't remember). It's mostly only important for DEScrypt and soon (once #4092 is merged) WPA-PSK.

I believe hashcat is currently only showing c/s, which is a constant confusion to end users (given a thousand salts it might look like we're a thousand times slower 🙄).

In single mode however, all bets are off. Aren't p/s and c/s the same? The keyspace is different for each salt.

AlekseyCherepanov commented 5 years ago

I found the reason: single mode does additional crypt_all() without candidate. So we get adequate c/s and lower p/s. With more candidates, the stake of empty crypt_all() is lower.

I made fake scrypt hashes to avoid rules:

$ printf '%03d:scrypt$%012d$15$8$1$64$aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa==\n' {1..4} > test2.pw
$ wc -l test2.pw
2 test2.pw

Changes in the tracer:

run </dev/null test2.pw --single=none --skip-self-tests
[...]
break *fmt_django_scrypt.methods.crypt_all

Output:

$ gdb -x trace_django.gdb --batch ./JohnTheRipper/run/john 2>&1 | grep -e set_ -e crypt_all | sed -e 's/ at .*//'

Breakpoint 3, set_salt (salt=0x55555914fb08)
Breakpoint 2, scrypt_set_key (key=0x7fffffffb0f0 "001", index=0)
Breakpoint 4, crypt_all (pcount=0x7fffffffac9c, salt=0x55555914faa0)

Breakpoint 3, set_salt (salt=0x55555914fcc0)
Breakpoint 2, scrypt_set_key (key=0x7fffffffb0f0 "003", index=0)
Breakpoint 4, crypt_all (pcount=0x7fffffffac9c, salt=0x55555914fc58)

Breakpoint 3, set_salt (salt=0x55555914fb08)
Breakpoint 4, crypt_all (pcount=0x7fffffffad5c, salt=0x55555914faa0)

Breakpoint 3, set_salt (salt=0x55555914fcc0)
Breakpoint 4, crypt_all (pcount=0x7fffffffad5c, salt=0x55555914fc58)

The following message suggests that the effect was expected:

Almost done: Processing the remaining buffered candidate passwords, if any.

magnumripper commented 5 years ago

That is strange. Can you reproduce it with John proper?

solardiz commented 5 years ago

On Thu, Oct 10, 2019 at 06:19:26PM -0700, magnum wrote:

In any cracking mode but single (as you probably know), it's like this:

p/s is how many candidates per second we're doing.

c/s is p/s divided by number of unique salts

C/s is c/s but showing the boost from same-salt hashes

This is only exactly correct when both nothing gets cracked and we are (and were) not running single mode (nor are in batch mode, where single mode is a step).

When the number of hashes (and maybe salts) changes as some hashes get cracked, these figures become progressively more independent, and you can't compute one from another.

They are average password candidates (p/s), hash computations (c/s), and tested {computed hash, stored hash} combinations (C/s) per second.

In single mode however, all bets are off. Aren't all three just the same?

No. They are still the averages as explained above.

AlekseyCherepanov commented 5 years ago

Yes, the same happens with John proper.

Fake bcrypts:

$ printf '%03d:$2a$05$%021d.E5YPO9kmyuRGyh0XouQYb4YMJKvyOeW\n' {1..4} > bf2.pw
$ cat bf2.pw
001:$2a$05$000000000000000000002.E5YPO9kmyuRGyh0XouQYb4YMJKvyOeW
003:$2a$05$000000000000000000004.E5YPO9kmyuRGyh0XouQYb4YMJKvyOeW

john building with debugging symbols (add -g to all commands, remove -s for ld):

john-1.9.0/src$ make OMPFLAGS=-g LDFLAGS= linux-x86-64-avx2

Investigate min/max kpc:

john-1.9.0/src$ gdb ../run/john
[...]
(gdb) break main
Breakpoint 1 at 0x555555559e90: file john.c, line 666.
(gdb) run
[...]
(gdb) p fmt_BF
[...]
min_keys_per_crypt = 3,
max_keys_per_crypt = 3,
[...]

Tracer:

set pagination off

tbreak main
run </dev/null bf2.pw --single

break *fmt_BF.methods.set_key
break *fmt_BF.methods.set_salt
break *fmt_BF.methods.crypt_all
continue

while 1
  continue
end

I edited john.conf to leave only : in [List.Rules:Single].

Trace:

$ gdb -x trace_bf.gdb --batch ./john-1.9.0/run/john 2>&1 | grep -e set_salt -e crypt_all -e set_key | sed -e 's/ at .*//'
[...self tests...]
Breakpoint 3, set_salt (salt=0x5555557f6688)
Breakpoint 2, set_key (key=0x7fffffffb500 "003", index=0)
Breakpoint 4, crypt_all (pcount=0x7fffffffb0ac, salt=0x5555557f6648)

Breakpoint 3, set_salt (salt=0x5555557f6588)
Breakpoint 2, set_key (key=0x7fffffffb500 "001", index=0)
Breakpoint 4, crypt_all (pcount=0x7fffffffb0ac, salt=0x5555557f6548)

Breakpoint 3, set_salt (salt=0x5555557f6688)
Breakpoint 4, crypt_all (pcount=0x7fffffffb18c, salt=0x5555557f6648)

Breakpoint 3, set_salt (salt=0x5555557f6588)
Breakpoint 4, crypt_all (pcount=0x7fffffffb18c, salt=0x5555557f6548)

So additional crypt_all() happens even if previous call was with number of candidates under min. KPC.

AlekseyCherepanov commented 5 years ago

The following message suggests that the effect was expected:
Almost done: Processing the remaining buffered candidate passwords, if any.

I was wrong: this message is printed before candidates when there is 1 candidate per salt. So it is not related to the empty crypt_all() in the end.

I am not sure that it is empty, it may check the last candidate against every salt, but I did not check that. Anyway the additional calls to crypt_all() seem wrong.

AlekseyCherepanov commented 5 years ago

Knowing precise total number of candidates, it is possible to exit before the additional calls using --max-candidates= option.

magnumripper commented 5 years ago

In order to spot an excessive call to crypt_all() you really need to monitor set_salt() and [at least index 0 of] set_key(). I'd also add one to clear_keys() if there is one. Just throw some standard debug print, like:

    fprintf(stderr, "%s() at %s:%u\n", __FUNCTION__, __FILE__, __LINE__);

    if (index == 0)
        fprintf(stderr, "%s() at %s:%u\n", __FUNCTION__, __FILE__, __LINE__);

magnumripper commented 4 years ago

I'm currently doing stuff within this context, will look at what's mentioned here as well.

Whatever stuff we add should probably be covered by #4408 as well (along with the normal brief OPTIONS description).

magnumripper commented 3 years ago

I did not find any option to disable word extraction fully.

"Fully" would end up with us having zero words... Do you mean from login field, or what? Or maybe you mean the word splitting, like taking the first character of gecos and use as a word of its own (the "q" ripped from "qwe,567"), and things like that. We can probably add means to disable that (or perhaps optionally only allow it up to a single batch of keys, somehow).

We have SingleSkipLogin, perhaps we could also have the complement of that, such as SingleOnlyLogin.

I did not find way to disable generation of pairs, because SingleWordsPairMax is raised automatically

We now have ~I have a pending patch that adds~ --single-pair-max=N where N can be zero - and when the command-line option is used, the auto-bump is disabled. Seems to work fine. Having said that, I've never been interested in limiting single mode this way, quite the opposite! As long as things are tried in a sensible order, what's the problem with it digging out many morphemes and pairs? Just like with incremental mode, you can abort it once the crack rate has decreased to "too boring".

Or put it like this: Perhaps we should instead focus on optimizing the order of words, morphemes and pairs?

additional crypt_all() happens even if previous call was with number of candidates under min. KPC.

It looks to me the "extra" calls to crypt_all seen above is because we test the last batch of keys with other salts. That's why no set_key is seen. This is already configurable, with SingleRetestGuess option.

solardiz commented 2 years ago

Looks like @magnumripper worked on some aspects of this issue, but doesn't intend to do anything else on it. If so, should we close this issue? Maybe with a brief summary on what was done, and what was not and why not. Thanks.

openwall / john

more control over single mode: need options to disable word extraction, to disable pairs and/or to feed exact candidates #4111