mandiant / capa

The FLARE team's open-source tool to identify capabilities in executable files.
Apache License 2.0
3.99k stars 499 forks source link

profile-time: more result reporting, and learn to specify other backends #2072

Closed williballenthin closed 2 months ago

williballenthin commented 2 months ago

Output looks like:

image

which renders to this on Github:

feature class evaluation count
evaluate.feature 19,939,641
evaluate.feature.and 4,441,407
evaluate.feature.rule 4,124,464
evaluate.feature.api 2,385,944
evaluate.feature.bytes 1,756,958
evaluate.feature.match 1,546,698
evaluate.feature.or 1,443,142
evaluate.feature.number 1,246,595
evaluate.feature.mnemonic 1,205,911
evaluate.feature.regex 271,779
evaluate.feature.os 264,511
evaluate.feature.string 192,866
evaluate.feature.characteristic 178,392
evaluate.feature.some 163,596
evaluate.feature.operand[1].number 155,261
evaluate.feature.substring 127,813
evaluate.feature.arch 127,381
evaluate.feature.operand[0].offset 104,100
evaluate.feature.operand[1].offset 78,648
evaluate.feature.offset 56,995
evaluate.feature.range 31,907
evaluate.feature.property 21,125
evaluate.feature.format 7,604
evaluate.feature.not 6,108
evaluate.feature.operand[2].number 425
evaluate.feature.section 5
evaluate.feature.export 3
evaluate.feature.import 2
evaluate.feature.operand[0].number 1
label count(evaluations) min(time) avg(time) max(time)
5390e1a0 be2: insn: polish thunk handling a bit (dirty) 19,939,641 81.45s 83.54s 86.74s

Checklist

williballenthin commented 2 months ago

@s-ff FYI, this is a small script I've used in the past to help evaluate performance changes to capa. It benchmarks the rule matching phase and shows the number of times each type of feature was evaluated. This enables two things:

  1. if we reduce the overall count, then it means we're doing less work, so capa is running faster, and
  2. we can identify hotspots (features that are evaluated a huge number of times) and optimize those

For example, API features are evaluated around 2 million times in the above example, while import features are only evaluated twice, so its probably worthwhile to spend more time optimizing API features than import features, if possible.

This is all just background info for you, nothing expected at this time :-)

mr-tz commented 2 months ago

great!