mandiant / capa

The FLARE team's open-source tool to identify capabilities in executable files.
https://mandiant.github.io/capa/
Apache License 2.0
4.14k stars 522 forks source link

optimize rule matching by better indexing rule by features #2125

Closed williballenthin closed 3 months ago

williballenthin commented 3 months ago

(continuation of #2080 rebased against master)

Implement the "tighten rule pre-selection" algorithm described here: https://github.com/mandiant/capa/issues/2063#issuecomment-2100498720

In summary:

Rather than indexing all features from all rules, we should pick and index the minimal set (ideally, one) of features from each rule that must be present for the rule to match. When we have multiple candidates, pick the feature that is probably most uncommon and therefore "selective".

This seems to work pretty well. Total evaluations when running against mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around 3x more functions per second (wow wow).

When doing large scale runs, capa is about 25% faster when using the vivisect backend (analysis heavy) or 3x faster when using the upcoming BinExport2 backend (minimal analysis).

closes #2074

williballenthin commented 3 months ago

thanks for the detailed and constructive reviews along the way @mike-hunhoff @mr-tz @s-ff !