profile-time: more result reporting, and learn to specify other backends

williballenthin commented 2 months ago

Output looks like:

which renders to this on Github:

feature class	evaluation count
evaluate.feature	19,939,641
evaluate.feature.and	4,441,407
evaluate.feature.rule	4,124,464
evaluate.feature.api	2,385,944
evaluate.feature.bytes	1,756,958
evaluate.feature.match	1,546,698
evaluate.feature.or	1,443,142
evaluate.feature.number	1,246,595
evaluate.feature.mnemonic	1,205,911
evaluate.feature.regex	271,779
evaluate.feature.os	264,511
evaluate.feature.string	192,866
evaluate.feature.characteristic	178,392
evaluate.feature.some	163,596
evaluate.feature.operand[1].number	155,261
evaluate.feature.substring	127,813
evaluate.feature.arch	127,381
evaluate.feature.operand[0].offset	104,100
evaluate.feature.operand[1].offset	78,648
evaluate.feature.offset	56,995
evaluate.feature.range	31,907
evaluate.feature.property	21,125
evaluate.feature.format	7,604
evaluate.feature.not	6,108
evaluate.feature.operand[2].number	425
evaluate.feature.section	5
evaluate.feature.export	3
evaluate.feature.import	2
evaluate.feature.operand[0].number	1

label	count(evaluations)	min(time)	avg(time)	max(time)
5390e1a0 be2: insn: polish thunk handling a bit (dirty)	19,939,641	81.45s	83.54s	86.74s

Checklist

[x] No CHANGELOG update needed
[x] No new tests needed
[x] No documentation update needed

williballenthin commented 2 months ago

@s-ff FYI, this is a small script I've used in the past to help evaluate performance changes to capa. It benchmarks the rule matching phase and shows the number of times each type of feature was evaluated. This enables two things:

if we reduce the overall count, then it means we're doing less work, so capa is running faster, and
we can identify hotspots (features that are evaluated a huge number of times) and optimize those

For example, API features are evaluated around 2 million times in the above example, while import features are only evaluated twice, so its probably worthwhile to spend more time optimizing API features than import features, if possible.

This is all just background info for you, nothing expected at this time :-)

mr-tz commented 2 months ago

great!

mandiant / capa

profile-time: more result reporting, and learn to specify other backends #2072

Checklist