thecodrr / fdir

⚡ The fastest directory crawler & globbing library for NodeJS. Crawls 1m files in < 1s
https://thecodrr.github.io/fdir/
MIT License
1.51k stars 58 forks source link

Filtering performance improvements #21

Closed thecodrr closed 4 years ago

thecodrr commented 4 years ago

Currently, fdir is the fastest directory crawler in the Node.js world even with filtering/globbing. However, the filtering performance is not up to par with non-filtering performance i.e., the gap is too big.

Current:

Running "Synchronous (2642 files, 330 folders)" suite...

  fdir simple sync:
    283 ops/s, ±0.63%   | fastest

  fdir filter sync:
    278 ops/s, ±0.35%   | 1.77% slower

  fdir glob sync:
    259 ops/s, ±0.33%   | slowest, 8.48% slower

Running "Asynchronous (2642 files, 330 folders)" suite...

  fdir simple async:
    468 ops/s, ±2.32%   | fastest

  fdir filter async:
    428 ops/s, ±2.55%   | 8.55% slower

  fdir glob async:
    378 ops/s, ±2.45%   | slowest, 19.23% slower

Okay, filter performance is ~2-10% slower while glob performance is ~10-20% slower. That is quite slow relatively.

So the question is: How do we reduce this performance gap?

thecodrr commented 4 years ago

Here are the benchmark results with all filtering done after the crawling:

Running "Synchronous (2642 files, 330 folders)" suite...

  fdir simple sync:
    291 ops/s, ±0.42%   | fastest

  fdir filter sync:
    281 ops/s, ±0.32%   | 3.44% slower

  fdir glob sync:
    257 ops/s, ±0.48%   | slowest, 11.68% slower

  fdir filter after sync:
    272 ops/s, ±1.86%   | 6.53% slower

  fdir glob after sync:
    266 ops/s, ±0.47%   | 8.59% slower

Running "Asynchronous (2642 files, 330 folders)" suite...

  fdir simple async:
    464 ops/s, ±2.46%   | fastest

  fdir filter async:
    445 ops/s, ±2.24%   | 4.09% slower

  fdir glob async:
    386 ops/s, ±2.42%   | slowest, 16.81% slower

  fdir filter after async:
    431 ops/s, ±2.35%   | 7.11% slower

  fdir glob after async:
    397 ops/s, ±2.28%   | 14.44% slower

Not a huge difference. Filtering is 2x slower if we do it after while we gain about ~3% if we do globbing after. Hm...

Edit: These results are borked as I made a mistake with how picomatch was being initialized in different tests.

thecodrr commented 4 years ago

It would be cool to add picomatch caching at a global level so a single picomatch instance will be used across multiple fdir instances provided they have the exact same patterns. This would be a huge performance boost.