Open magnumripper opened 6 years ago
@solardiz please help me think. Is the above a correct assumption at all?
Perhaps this would need a GPU kernel that is aware of all salts at once? I believe there's nothing really in the way to prevent that.
It's been some years and I don't recall all detail anymore. IIRC, hashcat did this on GPU, so probably yes, there's room for some great speedup. OTOH, it looks like the remaining processing, which we have in cmp_*(), is done selectively - so it could be tricky to parallelize it efficiently. Since it's DES, the same applies to using a bitslice implementation, which would be relevant both on CPU and on GPU.
While we're at it, aren't the bitwise-ORs into bitmap
in ntlmv1_mschapv2_fmt_plug.c's crypt_all()
unsafe in OpenMP builds? Do they possibly predate our addition of OpenMP support to there? Luckily, I think this format is built without OpenMP support by default, but when it is perhaps it currently has a (low) chance of producing false negatives - when two threads would each try to set a different bit in the same bitmap array element at once - or is this somehow not possible (each thread somehow has a whole number of bitmap array elements all to itself)? A straightforward fix would be adding #pragma omp atomic
before those ORs.
Oh, right. Non-SIMD builds are OpenMP by default. I'm adding #pragmas right away, just in case.
hashcat's benchmark shows NT speeds. Not sure how it will cope with many salts.
Apparently these formats are still used in the wild. We should have GPU support for them. The "many salts" case has totally wild figures on CPU already - it can be even more crazy fast on GPU.
Like... if we do 10G NT hashes per second and load 500 salts of NTLMv1, we'd get, uh.... 5T c/s on GPU. Did I just have a beer too much?