gulpjs / glob-stream

Readable streamx interface over anymatch.
MIT License
178 stars 51 forks source link

v5 becomes significantly slower if a lot of NPM packages are installed #129

Closed MuTsunTsai closed 7 months ago

MuTsunTsai commented 7 months ago

What were you expecting to happen?

The performance of Gulp v5 shouldn't be effected by installing additional NPM packages.

What actually happened?

Gulp v5 becomes noticable slower merely by having more NPM packages installed, especially if one uses PNPM as package manager.

Please give us a sample of your gulpfile

https://github.com/MuTsunTsai/gulp-5-test

In this repo, the gulp file is doing a very simple task of copying the package.json file to a new location. If gulp is the only dependency installed in this repo, this is done as quickly as about 400ms, but as the installed dependencies increases, it becomes slower and slower. In this repo it has a few dozens of dependencies installed (as in the case of my actual projects). If I use NPM to install them, after the installation Gulp executes the exact same task in about 1.2s, which is 3x slower than without other dependencies. If I use PNPM (which I usually do these days) instead, it's a lot even worse, taking over 6s just to complete this simple task.

Terminal output / screenshots

image

Please provide the following information:

phated commented 7 months ago

This is due to our new directory walking technique. We're looking into ways to improve it in updates to glob-stream.

MuTsunTsai commented 7 months ago

@phated Alright, thanks for the great work and I'm really happy to see v5 released in any case! Gulp has always been an irreplaceable part of my workflow.

Meanwhile, it appears that overriding the version of glob-stream fixes the issue for now (although I'm not sure if there's any compatibility issues with Gulp v5). For example, with PNPM:

{
    "pnpm": {
        "overrides": {
            "glob-stream": "7.0.0"
        }
    }
}
MuTsunTsai commented 7 months ago

@phated So I was trying to understand how glob-stream works and see if there's anything I can contribute. My current understanding is that:

And the motivation of it is described in #118. Basically you guys are trying to make the behavior more predictable, so to speak.

I think to solve the performance issue, what is needed here is some kind of "partial matching check" mechanism, which is to see if a folder partially matches any glob (that is, if it is potentially possible for it to contain anything that will match), and if not, there's no need to look further into its subtree.

In my example, my glob is just the "package.json" file without any wildcard, and yet the walking part still go through the entire node_modules tree regardlessly (and that explains why PNPM is worse, as it creates a much larger folder tree than NPM). At the moment it visits the node_modules folder it should have concluded that "oh this is already deeper than the given glob, so it is impossible for it to contain what we want" and skipped.

I hope this makes sense to you. I'll see if I have more free time to contribute further.

phated commented 7 months ago

@MuTsunTsai thanks! I'm thinking we'll do an optimization where we only traverse the glob-parent (or just a single file for singular globs)

MuTsunTsai commented 7 months ago

Confirm fixed!