Add "overall cost" (always first cost) for some formats

magnumripper commented 3 years ago

As noted in https://github.com/openwall/john/issues/4031#issuecomment-922546987, some formats may have a compound cost from several of the now reported ones, one that is crucial to use when autotuning for real hashes [the shared autotune always picks the salt with worst first cost (i.e. methods.tunable_cost_value[0](salt)) for its tests, not because that is most likely to end up with best performance but because anything else might overtax the GPU]. We could call it "overall cost" for the few formats that need it, pushing the currently existing costs to higher indeces.

As a good example, 7z can be badly hurt if we need to LZMA decompress a very large sized chunk of data - but only if the padding size used for early rejection is small. Worst case in that very case is no padding, meaning we need to decompress all of it for each candidate. On the other end of the scale is a padding size of, say, 8, where the data blob size virtually doesn't matter.

Getting this cost right isn't always trivial: For example, the LZMA decompress might still semi-early-reject if we're lucky (so we'd need to get some figures from real-life data for how to weigh that). OTOH if we're talking a stored file we always need to CRC-32 all of it. A padding size of 0 for a stored, very large, file might be worse than a padding size of 1 for a LZMA compressed much smaller one, and so on. But we should be able to get some ballpark figure with simple formulas. In some cases we already know most data points such as "we can early reject 96% with huffman checks" or "a padding check for a padding size of 3 means 1 out of 16M will randomly fall through". Then it's just a matter of doing the math.

This obviously only matters if attacking several and very different salts. As long as you only load one 7z hash, the current code will auto-tune just fine for it. Moreover, it might be a dubious idea to even attack very different salts simultaneously - but then this reporting will be some help to realize that.

magnumripper commented 3 years ago

While assessing this task I realized formats that are FMT_BLOB (non-hash formats with several "binaries" [blobs] per salt), currently can't have costs. I'd probably have to add code to fix that.

magnumripper commented 3 years ago

On another note we have some formats that has "padding size" as a cost, but it's actually backwards: A padding size of 0 (or 1, depending on padding scheme) is worst and one of 8 or more is best. So to strictly report it as a cost we might want to return something like 16 - pad_size.

Then again, I believe no format has padding size as first cost, so it's currently more a means for selecting which to attack or what to benchmark.

magnumripper commented 3 years ago

It's not trivial to figure out a "correct" overall cost. Let's continue with 7z as an example. It currently has three costs defined: Iteration count, padding size and compression type. The type is fairly irrelevant for performance as long as it's not 0 (stored, lowest cost). There is another actual cost parameter that we could define: Data size. That one (if paired with small pad size) can really hit performance (and does so on CPU even for the OpenCL format, just like RAR3 does).

The iteration count is the basic cost. Every candidate is hit by it for calculating the key. Then we do a 2-block AES and padding check, if possible (pad size can be 0 and we'd know that beforehand).

If pad size is over, say 6, the other costs virtually doesn't matter. But if padding size is small or even zero, we're hit by full AES decryption (cost: data size) deflating cost (compression type and data size) and then (provided the inflation process didn't early reject from bad data) a full CRC (again data size). I have absolutely no idea how to calculate an "overall cost" from all that.

magnumripper commented 3 years ago

Parts of it are easier:

If we have AES pad_size and data_len, we can combine then to data_len / pow(256, pad_size) for an "effective length" after just decrypting the last block.
A huffman check (such as those we have in pkzip and RAR) will reject over 96% of candidates from just reading first block so that could be something like data_len / 320. If we had both of these, it'd obviously be data_len / pow(256, pad_size) / 320

Once we actually call some inflate function (inflate, decompress, LZMA(2) decoding, RAR) they will often reject semi-early but it's hard to put a figure on it. I saw 50% in an experiment but didn't measure how much data it had to process on average. That was with a fairly short chunk of data - I assume for longer blobs it will reject nearly 100% of bad data sooner or later.

I think I'll just drop this idea and close this.

openwall / john

Add "overall cost" (always first cost) for some formats #4814