Filtering performance improvements

thecodrr commented 4 years ago

Currently, fdir is the fastest directory crawler in the Node.js world even with filtering/globbing. However, the filtering performance is not up to par with non-filtering performance i.e., the gap is too big.

Current:

Running "Synchronous (2642 files, 330 folders)" suite...

  fdir simple sync:
    283 ops/s, ±0.63%   | fastest

  fdir filter sync:
    278 ops/s, ±0.35%   | 1.77% slower

  fdir glob sync:
    259 ops/s, ±0.33%   | slowest, 8.48% slower

Running "Asynchronous (2642 files, 330 folders)" suite...

  fdir simple async:
    468 ops/s, ±2.32%   | fastest

  fdir filter async:
    428 ops/s, ±2.55%   | 8.55% slower

  fdir glob async:
    378 ops/s, ±2.45%   | slowest, 19.23% slower

Okay, filter performance is ~2-10% slower while glob performance is ~10-20% slower. That is quite slow relatively.

So the question is: How do we reduce this performance gap?

If we move all the filtering to the end of crawling operation and simply do array.filter over the results, the performance will increase by 2x (I think). However, we might face an issue with grouped output.

thecodrr commented 4 years ago

Here are the benchmark results with all filtering done after the crawling:

Running "Synchronous (2642 files, 330 folders)" suite...

  fdir simple sync:
    291 ops/s, ±0.42%   | fastest

  fdir filter sync:
    281 ops/s, ±0.32%   | 3.44% slower

  fdir glob sync:
    257 ops/s, ±0.48%   | slowest, 11.68% slower

  fdir filter after sync:
    272 ops/s, ±1.86%   | 6.53% slower

  fdir glob after sync:
    266 ops/s, ±0.47%   | 8.59% slower

Running "Asynchronous (2642 files, 330 folders)" suite...

  fdir simple async:
    464 ops/s, ±2.46%   | fastest

  fdir filter async:
    445 ops/s, ±2.24%   | 4.09% slower

  fdir glob async:
    386 ops/s, ±2.42%   | slowest, 16.81% slower

  fdir filter after async:
    431 ops/s, ±2.35%   | 7.11% slower

  fdir glob after async:
    397 ops/s, ±2.28%   | 14.44% slower

Not a huge difference. Filtering is 2x slower if we do it after while we gain about ~3% if we do globbing after. Hm...

Edit: These results are borked as I made a mistake with how picomatch was being initialized in different tests.

thecodrr commented 4 years ago

It would be cool to add picomatch caching at a global level so a single picomatch instance will be used across multiple fdir instances provided they have the exact same patterns. This would be a huge performance boost.

thecodrr / fdir

Filtering performance improvements #21