Spurious self-test and other defensive measures

magnumripper commented 3 years ago

Just throwing this out as another case of brainstorming.

For long-ish runs, maybe we could throw in a test vector at a rate of 1/hour, 1/day or completely randomly, verifying that things still seem dandy.

solardiz commented 3 years ago

I like the concept, but this would be tricky to implement without performance impact per every crypt_all().

If we introduce the test vector's hash into the loaded hashes database (and then remove it from there, then reintroduce, etc.), it would also test cracker.c code that our current self-test does not. This provides extra justification to go for it.

magnumripper commented 3 years ago

I'm mostly concerned about GPU's and possibly FPGA's. If our initial self-tests are fine I doubt a CPU format would later wreak havoc. But with GPU's I wouldn't be shocked if it turned out we caught one producing 100% false negatives after some time.

On that subject, today I saw NT-opencl miss a couple of candidates. I was using --loopback option after exhausting LM-opencl against a test set. It found one more password on second run, and yet another one on third run. They were super trivial, so while the scenario could happen for other reasons (permuting stuff it got on first run) I'm pretty sure that wouldn't explain it this time. I know how to reproduce so I'll try to dig into it.

magnumripper commented 3 years ago

In discussion elsewhere, @solardiz said

We know how very fragile AMD GPU software is, and how easy it is to crash/lockup the GPU by an OpenCL kernel. Given this, I wonder if cross-kernel data corruptions are also common. I think single-salt runs might be impacted especially badly, since that salt just sits on the GPU normally-unchanged, and it possibly changing wouldn't result in an instant crash but rather in miscomputation for the rest of the run. I wonder if we should use defensive programming, occasionally updating the salt where this wouldn't cause a measurable performance impact.

Perhaps we can use a trivial timer similar to what we ended up using for avoiding excessive calls to fix_state().

magnumripper commented 3 years ago

Perhaps we can use a trivial timer similar to what we ended up using for avoiding excessive calls to fix_state().

Here's one way to do it (tested):

diff --git a/src/cracker.c b/src/cracker.c
index de2ee5b89..0ed50c557 100644
--- a/src/cracker.c
+++ b/src/cracker.c
@@ -112,6 +112,11 @@ int crk_max_keys_per_crypt(void)

 static void crk_dummy_set_salt(void *salt)
 {
+   /* Refresh salt every 30 seconds in case it was thrashed */
+   if (event_refresh_salt > 30) {
+       crk_db->format->methods.set_salt(salt);
+       event_refresh_salt = 0;
+   }
 }

 static void crk_dummy_fix_state(void)
diff --git a/src/signals.c b/src/signals.c
index 856e99563..960d8ff4a 100644
--- a/src/signals.c
+++ b/src/signals.c
@@ -61,7 +61,7 @@ volatile int event_pending = 0, event_reload = 0;
 volatile int event_abort = 0, event_save = 0, event_status = 0, event_delayed_status = 0;
 volatile int event_ticksafety = 0;
 volatile int event_mpiprobe = 0, event_poll_files = 0;
-volatile int event_fix_state = 0;
+volatile int event_fix_state = 0, event_refresh_salt = 0;

 volatile int timer_abort = 0, timer_status = 0;
 static int timer_save_interval;
@@ -394,6 +394,7 @@ static void sig_handle_timer(int signum)
 #endif /* OS_TIMER */

    event_fix_state = 1;
+   event_refresh_salt++;

 #endif /* !BENCH_BUILD */

diff --git a/src/signals.h b/src/signals.h
index 1d9f5f9ff..9a9672fca 100644
--- a/src/signals.h
+++ b/src/signals.h
@@ -37,7 +37,8 @@ extern volatile int event_save;       /* Save the crash recovery file */
 extern volatile int event_status;  /* Status display requested */
 extern volatile int event_delayed_status;  /* Status display requested after current batch */
 extern volatile int event_ticksafety;  /* System time in ticks may overflow */
-extern volatile int event_fix_state;    /* For cracker */
+extern volatile int event_fix_state;   /* For cracker */
+extern volatile int event_refresh_salt;    /* For defensive salt refresh every nth seconds */
 #ifdef HAVE_MPI
 extern volatile int event_mpiprobe;    /* MPI probe for messages requested */
 #endif

We could want to have a configurable time instead of the 30 seconds, or just go for some sane figure (is 30 seconds fine?). This patch doesn't care whether the format is OpenCL or not, nor if the format actually uses a salt but that won't matter so better keep it simple, right?

magnumripper commented 3 years ago

An even simpler patch could be touching nothing but crk_dummy_set_salt(). Simplest (but dumbest) of all could just call real set_salt() every 100th (or whatever) time. A more clever way would be to call status_get_time() and keep a static for when we last did a real set_salt.

magnumripper commented 3 years ago

So here's a KISS patch

diff --git a/src/cracker.c b/src/cracker.c
index de2ee5b89..7292222bf 100644
--- a/src/cracker.c
+++ b/src/cracker.c
@@ -112,6 +112,14 @@ int crk_max_keys_per_crypt(void)

 static void crk_dummy_set_salt(void *salt)
 {
+   static int last_refresh;
+   int now = status_get_time();
+
+   /* Refresh salt every 30 seconds in case it was thrashed */
+   if (now - last_refresh > 30) {
+       crk_db->format->methods.set_salt(salt);
+       last_refresh = now;
+   }
 }

 static void crk_dummy_fix_state(void)

The problem with this is it will call status_get_time() (3.3 us on this laptop) every single time. Something a little more elaborate is needed for not introducing a penalty with fast formats (let alone unsalted ones).

Perhaps we can exploit the existing timer_save_value (just embracing the fact it will become dependant on Save setting in the config).

solardiz commented 3 years ago

Here's one way to do it (tested):

Looks good to me.

So here's a KISS patch

The problem with this is it will call status_get_time() (3.3 us on this laptop) every single time. Something a little more elaborate is needed

You already have that more elaborate thing with event_refresh_salt above.

Perhaps we can exploit the existing timer_save_value (just embracing the fact it will become dependant on Save setting in the config).

This is also an option, or we could do the salt update (up to) once per second (actually, less frequently than that when a single crypt_all() call takes more than a second). I think this would be OK.

magnumripper commented 3 years ago

Looks good to me.

What about this then?

(…) or just go for some sane figure (is 30 seconds fine?).

We could probably hard-code it higher but at 30 seconds I guess we’re not introducing any measurable penalty

solardiz commented 3 years ago

I'm OK with 30 seconds.

BTW, we have a salt_changed variable in 3 CPU formats, but not anywhere else - so apparently the changes you propose here will take care of all OpenCL formats.

openwall / john

Spurious self-test and other defensive measures #4456