Python flirt slows down analysis - Githubissues

mandiant / capa

The FLARE team's open-source tool to identify capabilities in executable files.

https://mandiant.github.io/capa/

Apache License 2.0

4.91k stars 565 forks source link

Python flirt slows down analysis #1272

Open mr-tz opened 1 year ago

mr-tz commented 1 year ago

I've noticed this before with vivisect and library matching. Using the default signatures analysis is very slow. Without signatures it's less slow. Can someone reproduce this before I investigate further?

Example binaries: 0b8a4b3d83f94cab837b9ff51e5d7928df49537b3813ea1e4bf2d954952fc1c9 0adb26cb948f3fe4c56ab663026c7c0630340cae461cae0b69a64e2f35a2fe3b

williballenthin commented 1 year ago

i will look.

there will be a tradeoff between having FLIRT matching, which can reduce the number of functions to analyze, and not using FLIRT, which avoids doing any FLIRT matching, which takes a bit of time per function. we should probably consider a representative set of files and see if FLIRT improves the total runtime or not, and/or provides useful information.

looking at single samples can be useful to identify hotspots but can also be unfair.

mr-tz commented 1 year ago

Absolutely, my main concern would be bugs that occur in certain cases and make analysis very slow. In general, FLIRT is a great help to reduce FPs and reason better about a program.

williballenthin commented 1 year ago

analysis is about 10s/12% slower (across 5500 functions) when FLIRT is enabled.

mr-tz commented 1 year ago

Huh, what about the other sample?

williballenthin commented 1 year ago

I didn’t find it on VT. Can you share it with me privately?

williballenthin commented 1 year ago

lots of FLIRT matches so (on my system) using FLIRT makes the overall runtime much better.

williballenthin commented 1 year ago

on my system about 2.5s spent parsing and compiling the rules. this is probably heavily CPU dependent, so on less resourced systems i'd expect this to be a bit slower.

williballenthin commented 1 year ago

we can also use get_flirt_matches.py to triage FLIRT performance outside of capa.

mr-tz commented 1 year ago

🤯 wow, maybe it's just my setup... will investigate further