mandiant / capa

The FLARE team's open-source tool to identify capabilities in executable files.
Apache License 2.0
3.99k stars 499 forks source link

optimize BinExport read_bytes performance #2061

Closed williballenthin closed 2 months ago

williballenthin commented 2 months ago

It appears that we're creating a new PE/ELFFILE object per function, when we should be creating these once per analysis. Commenting out extract_insn_bytes_features (the caller) resulted in a significant performance improvement. @mr-tz this is a good place to start investigating for a performance bump, starting by seeing if we can store the PE/ELFFile object once in the AnalysisContext.

please read thread started by @mike-hunhoff in https://github.com/mandiant/capa/pull/1950#discussion_r1571363044_

williballenthin commented 2 months ago

I've confirmed the performance is worse with byte extraction enabled:

williballenthin commented 2 months ago

According to IDA, 0x420A81is the largest function in mimikatz. Using the following to triage performance:

$ python scripts/show-features.py --format binexport2 --backend binexport2 tests/data/mimikatz.exe_.ghidra.BinExport --function 0x420A81

no bytes:

❯ hyperfine "python scripts/show-features.py --format binexport2 --backend binexport2 tests/data/mimikatz.exe_.ghidra.BinExport --function 0x420A81 > /dev/null"
 Time (mean ± σ):      2.184 s ±  0.055 s    [User: 2.037 s, System: 0.147 s]
  Range (min … max):    2.089 s …  2.250 s    10 runs

with bytes:

  Time (mean ± σ):      2.615 s ±  0.267 s    [User: 2.471 s, System: 0.142 s]
  Range (min … max):    2.308 s …  2.987 s    10 runs

Ok, so not all the different, though 0.5s just to read bytes.... lets fix that.

Ah, but there are only 17 bytes features in this function. Wow. This is slow.

image

Confirmed the cached module is only being initialized once, thats good.

williballenthin commented 2 months ago

complete profile from py-spy (download .svg and load in browser for interactive exploration):

profile

williballenthin commented 2 months ago

I interpret the results above as: ~70% of runtime is spent evaluating rule matches, with around 9% of runtime spent on extracting instruction features.

williballenthin commented 2 months ago

binexport backend:

________________________________________________________
Executed in   55.95 secs    fish           external
   usr time   55.80 secs  388.00 micros   55.80 secs
   sys time    0.15 secs  277.00 micros    0.15 secs

vivisect backend:

________________________________________________________
Executed in   81.56 secs    fish           external
   usr time   80.85 secs    0.00 micros   80.85 secs
   sys time    0.57 secs  589.00 micros    0.57 secs