npm / cacache

npm's content-addressable cache
Other
280 stars 31 forks source link

fix: Speed up cache reads #255

Closed thecodrr closed 11 months ago

thecodrr commented 11 months ago

This PR speeds up read.stream and read by skipping fs.stat call if size was passed via opts. Currently, the only reason for doing a stat call is to get the size (and throw the size mismatch error if the size is different). This is unnecessary for 3 reasons:

  1. In the case of read.stream, the stream already compares the sizes at the end and throws an error if there's a mismatch.
  2. In the case of read, we can compare the sizes after reading the cache contents
  3. In both cases we are already doing an integrity check which would automatically fail if there's a size difference since the hashes would be different.

In this PR, the stat call is only made if the user does not pass a size property via opts. This makes sense because without knowing the size, the stream has to make an unnecessary fs.read call at the end before closing which has a significant cost (that cost is much, much greater than the cost of doing fs.stat).

On my machine, the benchmarks with this change look like this:

┌─────────┬─────────────────────┬─────────┬────────────────────┬───────────┬─────────┐
│ (index) │      Task Name      │ ops/sec │ Average Time (ns)  │  Margin   │ Samples │
├─────────┼─────────────────────┼─────────┼────────────────────┼───────────┼─────────┤
│    0    │ 'read.stream (new)' │ '4,643' │ 215352.03841424757 │ '±10.20%' │   465   │
│    1    │ 'read.stream (old)' │ '3,933' │ 254237.5665025663  │ '±7.17%'  │   394   │
│    2    │    'read (old)'     │ '2,915' │ 343045.55719845917 │ '±13.42%' │   292   │
│    3    │    'read (new)'     │ '4,392' │ 227636.30011033904 │ '±12.14%' │   449   │
└─────────┴─────────────────────┴─────────┴────────────────────┴───────────┴─────────┘

That's a solid 16% improvement in the case of read.stream and 36% improvement in the case of read.

References