Faster multi-file matching

Most of the improvement comes from skipping a couple of unnecessary heap allocations.

Most notably, each matched Rule was being copied out of the Ruleset slice, then returned as a reference, which meant it was being caught by escape analysis and getting promoted to the heap (causing an allocation). By taking the reference to the slice element and returning that we avoid the extra allocation. That change seems have a remarkably large impact when matching lots of files.

components $ hyperfine ./codeowners ./codeowners-new
Benchmark 1: ./codeowners
  Time (mean ± σ):      3.394 s ±  0.011 s    [User: 5.398 s, System: 0.232 s]
  Range (min … max):    3.374 s …  3.415 s    10 runs

Benchmark 2: ./codeowners-new
  Time (mean ± σ):      1.781 s ±  0.009 s    [User: 1.778 s, System: 0.022 s]
  Range (min … max):    1.776 s …  1.807 s    10 runs

Summary
  './codeowners-new' ran
    1.91 ± 0.01 times faster than './codeowners'

This branch also removes an unnecssary dependency (godirwalk), as stdlib directory walking is equally fast now.

hmarr / codeowners

Faster multi-file matching #12