Most of the improvement comes from skipping a couple of unnecessary heap allocations.
Most notably, each matched Rule was being copied out of the Ruleset slice, then returned as a reference, which meant it was being caught by escape analysis and getting promoted to the heap (causing an allocation). By taking the reference to the slice element and returning that we avoid the extra allocation. That change seems have a remarkably large impact when matching lots of files.
components $ hyperfine ./codeowners ./codeowners-new
Benchmark 1: ./codeowners
Time (mean ± σ): 3.394 s ± 0.011 s [User: 5.398 s, System: 0.232 s]
Range (min … max): 3.374 s … 3.415 s 10 runs
Benchmark 2: ./codeowners-new
Time (mean ± σ): 1.781 s ± 0.009 s [User: 1.778 s, System: 0.022 s]
Range (min … max): 1.776 s … 1.807 s 10 runs
Summary
'./codeowners-new' ran
1.91 ± 0.01 times faster than './codeowners'
This branch also removes an unnecssary dependency (godirwalk), as stdlib directory walking is equally fast now.
Most of the improvement comes from skipping a couple of unnecessary heap allocations.
Most notably, each matched
Rule
was being copied out of theRuleset
slice, then returned as a reference, which meant it was being caught by escape analysis and getting promoted to the heap (causing an allocation). By taking the reference to the slice element and returning that we avoid the extra allocation. That change seems have a remarkably large impact when matching lots of files.This branch also removes an unnecssary dependency (godirwalk), as stdlib directory walking is equally fast now.