restrict scan_aes to 4KiB pages at a time

simsong commented 2 years ago

scan_aes only AES keys from RAM.

Since RAM is typically going to be fragments of swap files, sometimes compressed, AES keys are unlikely to cross 4KiB pages (because of virtual memory).
Analysis of the AES key schedule has shown that even with very simple keys (e.g. 0x0000000000000000 and 0x0000000000000001) there are at least 50 distinct byte values in the AES key schedule.

Therefore, I'm thinking of having scan_aes have a default mode of operation in which it breaks the input into 4KiB chunks and processes each ones only if there are at least 50 different hex codes.

I have added a method to sbuf_t that returns the histogram of the sbuf. However, I can add an even faster method that will return true if there are at least NN distinct codes in the buffer (because it can stop as soon as it finds them).

@jonstewart - what do you think of this idea?

jonstewart commented 2 years ago

A histogram idea: rather than counting bytes, could you discriminate by counting nibbles? The latter could be many times faster.

Sent from my iPhone

On Dec 11, 2021, at 9:07 AM, Simson L. Garfinkel @.***> wrote:

scan_aes only AES keys from RAM.

Since RAM is typically going to be fragments of swap files, sometimes compressed, AES keys are unlikely to cross 4KiB pages (because of virtual memory). Analysis of the AES key schedule has shown that even with very simple keys (e.g. 0x0000000000000000 and 0x0000000000000001) there are at least 50 distinct byte values in the AES key schedule. Therefore, I'm thinking of having scan_aes have a default mode of operation in which it breaks the input into 4KiB chunks and processes each ones only if there are at least 50 different hex codes.

I have added a method to sbuf_t that returns the histogram of the sbuf. However, I can add an even faster method that will return true if there are at least NN distinct codes in the buffer (because it can stop as soon as it finds them).

@jonstewart - what do you think of this idea?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

simsong commented 2 years ago

How would I count nibbles and why would that be faster? A nibble is 4 bits, correct?

jonstewart commented 2 years ago

Yes, nibbles are four bits, so two to a byte. There are some fun vector instructions that take a vector of nibbles as input. I am not sure whether a good histogram routine could be created, need to do a bit more research.

A normal scalar implementation of a byte histogram could have some data dependencies that it hard to get full pipelining.

A popcount-based algorithm might work, too. The point isn’t to get a perfect answer to any of these operations but to find ranges of data with sufficient entropy, something that can act as a very fast filter. Check out: http://0x80.pl/articles/avx512-pospopcnt-8bit.html (this guy has a ton of neat vector algorithms)

Sent from my iPhone

On Dec 11, 2021, at 9:23 AM, Simson L. Garfinkel @.***> wrote:

How would I count nibbles and why would that be faster? A nibble is 4 bits, correct?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

simsong commented 2 years ago

I'm not sure that this works for our use case. We might have the AES key schedule in the midst of a lot of NULLs. The AES key schedule is about ~512 bytes and there is, say, 50 distinct bytes in it. Sure, we could look at every other byte and look for only 25 distinct bytes. That would be twice the speed. But I don't think that a single loop through the memory is that intensive, and the histogram is a fast operation. How would vectorizing it help?

jonstewart commented 2 years ago

Wrt/ nibbles: It costs about one clock cycle to put the high nibbles for a vector of bytes into their own vector. Then you can do the same operation on each vector.

With vectorization, maybe we would be able to discriminate a vector’s worth of bytes in 10-20 cycles—with a 16 byte vector that’s about a cycle/byte. 32 byte vectors would be better still. A scalar byte histogram is probably going to be more like 3-8 cycles/byte (guessing here).

These kinds of techniques work really well with small automata, too. Unfortunately not so well with the big types of automata we deal with in scan_accts and scan_email.

On Dec 11, 2021, at 3:56 PM, Simson L. Garfinkel @.***> wrote:

I'm not sure that this works for our use case. We might have the AES key schedule in the midst of a lot of NULLs. The AES key schedule is about ~512 bytes and there is, say, 50 distinct bytes in it. Sure, we could look at every other byte and look for only 25 distinct bytes. That would be twice the speed. But I don't think that a single loop through the memory is that intensive, and the histogram is a fast operation. How would vectorizing it help?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

simsong commented 2 years ago

scan_aes is the slowest scanner, with scan_accts and scan_email also being quite slow. But scan_aes rarely, rarely finds what it needs. So that's where I am focusing my energies. I would like to make scan_accts and scan_email faster; how is LightGrep coming?

msuhanov commented 2 years ago

AES keys are unlikely to cross 4KiB pages (because of virtual memory)

It's not uncommon for data to cross the page boundary, with two adjacent pages being written to the page file together (one right after another and in the original order).

simsong commented 2 years ago

AES keys are unlikely to cross 4KiB pages (because of virtual memory)

It's not uncommon for data to cross the page boundary, with two adjacent pages being written to the page file together (one right after another and in the original order).

Why would the VM system put two logical pages on two adjacent physical pages?

msuhanov commented 2 years ago

I have no answer right now. Just noticed that data in page files can be contiguous in many cases, so carvers need to account more than one 4096-byte block to get better results.

simsong commented 2 years ago

OK. Then I won’t split it up. Thanks.

Sent from my phone.

On Dec 14, 2021, at 5:55 PM, msuhanov @.***> wrote:

I have no answer right now. Just noticed that data in page files can be contiguous in many cases, so carvers need to account more than one 4096-byte block to get beter results.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

simsong / bulk_extractor

restrict scan_aes to 4KiB pages at a time #307