mandiant / flare-floss

FLARE Obfuscated String Solver - Automatically extract obfuscated strings from malware.
Apache License 2.0
3.18k stars 446 forks source link

Improved memory usage or memory usage estimation #743

Open Dobatymo opened 1 year ago

Dobatymo commented 1 year ago

Hi, this more a feature request than a bug report.

I was analyzing a 23.0 MB windows executable (with --large-file, which seems undocumented using --help btw) and after 12-24 hours it failed with a MemoryError. My machine has 48GB of ram (with probably 40GB available for floss).

So my questions are:

INFO: floss: extracting static strings...
WARNING: floss: a large file was provided with a size of 24158888 bytes, this may take much more time and system resource to process
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
WARNING: viv_utils: cfg: incomplete control flow graph
finding decoding function features: 100%|█████| 355/355 [1:32:21<00:00, 15.61s/ functions, skipped 0 library functions]
INFO: floss.stackstrings: extracting stackstrings from 351 functions
extracting stackstrings:  32%|█████████████▌                             | 111/351 [2:16:34<4:55:16, 73.82s/ functions]
Traceback (most recent call last):
  File "main.py", line 688, in <module>
  File "main.py", line 626, in main
  File "stackstrings.py", line 172, in extract_stackstrings
  File "utils.py", line 414, in get_referenced_strings
  File "funcy\objects.py", line 28, in __get__
  File "viv_utils\__init__.py", line 197, in instructions
MemoryError
[3548] Failed to execute script 'main' due to unhandled exception!
williballenthin commented 1 year ago

FLOSS relies on vivisect to disassemble and analyze the input files. This is a pure Python project; the upside is that it runs everywhere that Python runs, the downside is that it's not fast nor memory efficient. For example, in your sample with 20MB, that might contain millions or billions of instructions, each of which is represented by an object by vivisect (and each operand to each instruction, too, and every recognized location, etc.).

While I'm sympathetic to your request, I don't think there are likely to be any quick wins here. Probably architectural changes to vivisect are needed, which is outside the scope of FLOSS. We've previously looked into improving the performance and found that it wasn't feasible without major changes.

As for estimating the memory usage, my only thoughts here might be to sample a collection of files with various sizes and collect their memory usage, and then extrapolate from that data. It might provide a rough estimate and could inform users of what to expect.

mr-tz commented 1 year ago

I think this is a bug in vivisect. Can you report it there (if you can share the sample/hash)?

For processing, we may abort after a certain time (or document it for users to do so). For most samples 5-10 minutes would be a good timeout value. Personally, the longest I would wait would probably be 1 hour.

Dobatymo commented 1 year ago

@mr-tz

Can you report it there (if you can share the sample/hash)?

I tried to analyze LINE.exe (version: 7.17.0.3035, sha256: 37aa29719c130e7b86d84253bd5fefa1ebeeb1b516bf7ea0c250ffeaa932833b, virus total: https://www.virustotal.com/gui/file/37aa29719c130e7b86d84253bd5fefa1ebeeb1b516bf7ea0c250ffeaa932833b) I get after installing https://desktop.line-scdn.net/win/new/LineInst.exe. Should I upload the executable somewhere?

I know this exe is Themida protected, which might complicate things.

mr-tz commented 1 year ago

Sorry for the delayed response. I've tested the shared binary, but yeah, it takes forever :) If you don't mind ask on this in the vivisect repo. Providing the hash there is fine. IDA only runs a few seconds on the sample, so something appears to be off.