cryptostalker currently detects new files and reads them from the filesystem in their entirety in order to determine randomness. We should stop doing this in favor of reading in smaller chunks at a time.
This means detecting the entropy of many chunks of the file and using a fractional entropy probability. E.g. if file has C chunks, then C/2 must be deemed random in order to signal the file as being "encrypted". This would be in contrast to the way cryptostalker works today where it consumes the whole file and evaluates the randomness in its entirety and is either 0 or 1--nonrandom or random.
Evaluating only parts of files could lead to false positives where legitimate files and/or binaries are only compressed or encrypted in part. Must test.
This chunked approach has two benefits:
Less memory consumption since we don't have to stuff all of the file in memory at once in order to analyze its entropy. This is particularly helpful since files are analyzed in goroutines, so we could potentially eat up lots of memory.
This would help mitigate the evasion tactic of injecting non-random data inside the encrypted files. E.g. an attacker could take a 1MB document fileX and encrypt it into a 2MB fileX'. The extra 1MB could be deterministic non-random data that would cause randomness detection to fail.
cryptostalker currently detects new files and reads them from the filesystem in their entirety in order to determine randomness. We should stop doing this in favor of reading in smaller chunks at a time.
This means detecting the entropy of many chunks of the file and using a fractional entropy probability. E.g. if file has C chunks, then C/2 must be deemed random in order to signal the file as being "encrypted". This would be in contrast to the way cryptostalker works today where it consumes the whole file and evaluates the randomness in its entirety and is either 0 or 1--nonrandom or random.
Evaluating only parts of files could lead to false positives where legitimate files and/or binaries are only compressed or encrypted in part. Must test.
This chunked approach has two benefits: