int128 hack using macros

magnumripper commented 9 years ago

I'm not really expecting you to merge this PR but FWIW here's what we'll use in JtR until proven bad ;-)

After the recent excellent speedup by @Sc00bz, this hack only give another 15% boost (it was 45% prior to 39006a55b). In JtR we also want it merely for making libgmp optional.

As far as I understand, this can't impose any practical limitations. Am I wrong?

You could of course opt to supply the header (and merge the minor pp.c changes) but not use it by default. Or have it as an alternative make target. Or just disregard this as the worst bunch of macros you've ever seen :laughing:

jsteube commented 9 years ago

I was already thinking about switching back to native 64 bit integers. Not for speed reasons but for an eventual gpu kernel implementation in case I get an idea how to do it.

Also, just to make sure I did not understood anything wrong from your comment, you do not want me to merge this code?

Sc00bz commented 9 years ago

You might want to have it cap at 2^128-1 or report an error and exit instead of overflowing because this is kind of hard to know exactly how large the key space will be. Also with capping this will allow you to still run the first trillion or whatever you actually what to do.

magnumripper commented 9 years ago

Also, just to make sure I did not understood anything wrong from your comment, you do not want me to merge this code?

Sure you can but there may be some obscure bug somewhere (though it's well tested and produces same results as a GMP build using various wordlists and skips).

I added separate make targets now so it should be safe to merge - the old targets will still produce GMP builds. I called the int128 targets "pp128" which is kinda right depending on how you look at it... feel free to merge and/or change at will.

magnumripper commented 9 years ago

BTW pp.c is initially checked in with CRLF line endings, this causes me grief all the time! I could probably solve it locally with some git-fu but I have yet to find out how.

magnumripper commented 9 years ago

You might want to have it cap at 2^128-1 or report an error and exit instead of overflowing because this is kind of hard to know exactly how large the key space will be. Also with capping this will allow you to still run the first trillion or whatever you actually what to do.

Agreed... except I'm not quite sure where to put that check.

magnumripper commented 9 years ago

I was already thinking about switching back to native 64 bit integers.

The plan for JtR is to eventually have this new header file use a struct of lo/hi uint64_t for systems that does not support int128.

jsteube commented 9 years ago

Just wanted to add, I was doing some tests with gmp and native 64 bit and I was surprised about the results. There was nearly no more speed loss due to gmp thanks to Steve's optimization.

Sc00bz commented 9 years ago

@jsteube This is expected since with the default DEF_WORDLEN_DIST[] you initially stay in that small loop, that doesn't touch GMP, for on average 62,454 loops.

@magnumripper I believe the best place for PP would be mpz_mul and mpz_add (I'll double check this sometime tomorrow):

#define mpz_add(rop, op1, op2) do { rop = (op1) + (op2); if (rop < op1) { rop = UINT128_MAX; } } while (0)

#define mpz_mul(rop, op1, op2) do { rop = (op1) * (op2); if (rop / (op1) != (op2)) { rop = UINT128_MAX; } } while (0)

I believe that both if statements will just check the overflow flag unless any of the inputs are not on the stack and you are using threads. If it doesn't just check the overflow flag then it shouldn't really affect the speed of PP. The only speed issues are if JtR uses this in other places.

magnumripper commented 9 years ago

I believe the best place for PP would be mpz_mul and mpz_add

Oh, right. I was thinking some specific check in the functions that build chains.

One problem is that you can't use 128-bit constants (eg. ULLL or whatever it could have be named - there is none). There is no UINT128_MAX and if you create one it can't be used... unless you create it as #define UINT128_MAX ((uint128_t)-1)... that should work fine, shouldn't it?

magnumripper commented 9 years ago

Or perhaps #define UINT128_MAX ((UINT64_MAX << 64) | UINT64_MAX)

magnumripper commented 9 years ago

Added saturation checks. I used #define UINT128_MAX ((uint128_t)-1) and verified it works fine with no annoying build warnings.

Sc00bz commented 9 years ago

Yeah I knew that, I was too lazy to be like also define that too. Huh I guess that wasn't that hard :). Really it was because I have no clue what it should be defined as and didn't want to emit that... shit. Also I'm too drunk to care that I said "emit" I know that's wrong but I just can't think of the correct word. What is it? I can't Google or MS Word this. I can't believe I don't know this... OMG I was just like admit?... I really should not post this. Fuck it I'm drunk.

magnumripper commented 9 years ago

Cheers!

magnumripper commented 9 years ago

#define mpz_add(rop, op1, op2) do { rop = (op1) + (op2); if (rop < op1) { rop = UINT128_MAX; } } while (0)

It just occured to me the above does not work well when rop and op1 is the same variable!

    mpz_add (total_ks_cnt, total_ks_cnt, tmp);

But we can use op2 instead.

I fixed that and also squashed all fixes into a single new commit. @jsteube I think you should merge this now!

magnumripper commented 9 years ago

BTW I have tried to test hitting saturation with 128-bit math but you need a shitload of memory to run into that limit so I didn't manage to get there even using lots of swap. But I tested it with a hacked header that typedefs mpz_t to uint64_t instead of uint128_t. It does warn about saturation and it seems to behave as expected. It produdes the same output as the 128-bit version up to about 0xfffffffffffff800, just 2K from the ceiling.

hashcat / princeprocessor

int128 hack using macros #16