gulpjs / glob-stream

Readable streamx interface over anymatch.
MIT License
178 stars 53 forks source link

Updating from v8.0.0 to v8.0.1 causes RangeError: Maximum call stack size exceeded #125

Closed blumendorf closed 5 months ago

blumendorf commented 6 months ago

Disclaimer: I am not sure this is actually a glob-stream issue, but it seems to be the only moving part.

What we did:

We are using i18next parser and recently updated dependencies. In the process, glob-stream got updated from 8.0.0 to 8.0.1.

This broke i18next-parser. Manually reverting back to 8.0.0 in the lock file fixed the issue.

original dependency chain: i18next-parser@8.12.0 -> vinyl-fs@4.0.0 -> glob-stream@8.0.0 updated dependency chain: i18next-parser@8.13.0 -> vinyl-fs@4.0.0 -> glob-stream@8.0.1

What were you expecting to happen?

Updating glob-stream from 8.0.0 to 8.0.1 should not be breaking.

What actually happened?

We got a RangeError: Maximum call stack size exceeded.

Terminal output / screenshots

> i18next

  i18next Parser
  --------------
  Input:  
  Output: public/locales/$LOCALE/$NAMESPACE.json

/my-project/node_modules/.pnpm/anymatch@3.1.3/node_modules/anymatch/index.js:94
    return (testString, ri = false) => {
           ^

RangeError: Maximum call stack size exceeded
    at /my-project/node_modules/.pnpm/anymatch@3.1.3/node_modules/anymatch/index.js:94:12
    at EventEmitter.onPath (/my-project/node_modules/.pnpm/glob-stream@8.0.1/node_modules/glob-stream/index.js:270:20)
    at EventEmitter.emit (node:events:518:28)
    at processDirents (/my-project/node_modules/.pnpm/glob-stream@8.0.1/node_modules/glob-stream/index.js:88:10)
    at next (/my-project/node_modules/.pnpm/now-and-later@3.0.0/node_modules/now-and-later/lib/mapSeries.js:43:5)
    at handler (/my-project/node_modules/.pnpm/now-and-later@3.0.0/node_modules/now-and-later/lib/mapSeries.js:57:9)
    at f (/my-project/node_modules/.pnpm/once@1.4.0/node_modules/once/once.js:25:25)
    at processDirents (/my-project/node_modules/.pnpm/glob-stream@8.0.1/node_modules/glob-stream/index.js:113:7)
    at next (/my-project/node_modules/.pnpm/now-and-later@3.0.0/node_modules/now-and-later/lib/mapSeries.js:43:5)
    at handler (/my-project/node_modules/.pnpm/now-and-later@3.0.0/node_modules/now-and-later/lib/mapSeries.js:57:9)

Node.js v20.11.0

Please provide the following information:

phated commented 6 months ago

@joshhunt Any thoughts why the symlink traversal would cause this? Could they have a circular symlink in their paths?

phated commented 6 months ago

@blumendorf Can you please provide additional debugging details? It is unclear what globs are being provided or what path is being processed. You can just add console.log in various places in glob-stream within node_modules. onPath seems like a good candidate.

phated commented 6 months ago

@blumendorf Looking at the stack trace and comparing with the code, it seems like a bad filepath (perhaps too long?) is being provided to anymatch here: https://github.com/gulpjs/glob-stream/blob/master/index.js#L269-L270. If you add a console.log between line 269 and 270 to log each filepath, we can see what path is causing you to blow the stack.

We only added a symlink fix in 8.0.1, so it seems you have a symlink that is now being resolved and that causes the glob match to blow the stack.

joshhunt commented 6 months ago

Thinking about it now - I guess recursive symlinks could cause an error with something like this.

I notice pnpm is being used, which makes heavy use of symlinks. At first I was eager to say "just don't have circular symlinks", but it's not really your fault if they're coming from inside pnpm's structure....

A minimal reproduction repo, or some debugging to see what paths it's choking on, would go a long way. In the meantime I'll see if I can have a play around with reproducing.

FYI - I introduced this symlink fix so I can get i18next-parser to follow symlinks, which is passing in our CI so far https://drone.grafana.net/grafana/grafana/169064/2/6

joshhunt commented 6 months ago

I've tried to reproduce this with... varying degrees of success. I've created this repo to help reproduce, which tests with both npm and pnpm, and i18next-scanner and i18next-parser https://github.com/joshhunt/glob-stream-i18next-scanner-repro

If I run ./test.sh, it runs fine. If I run ./create-symlinks.sh to create circular symlink references then rerun the tests it blows up with:

Error: ELOOP: too many symbolic links encountered, stat '/.../glob-stream-i18next-scanner-repro/contains-circular-symlinks/infinite-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b/link-to-a/link-to-b'
Emitted 'error' event on Transform instance at:
    at ReadableState.afterDestroy (/.../glob-stream-i18next-scanner-repro/node_modules/streamx/index.js:493:19)
    at Transform._destroy (/.../glob-stream-i18next-scanner-repro/node_modules/streamx/index.js:629:5)
    at ReadableState.updateNonPrimary (/.../glob-stream-i18next-scanner-repro/node_modules/streamx/index.js:386:16)
    at ReadableState.update (/.../glob-stream-i18next-scanner-repro/node_modules/streamx/index.js:367:71)
    at ReadableState.updateReadNT (/.../glob-stream-i18next-scanner-repro/node_modules/streamx/index.js:536:10)
    at process.processTicksAndRejections (node:internal/process/task_queues:77:11)

Note this is a different exception that both @blumendorf and @sunnysonx reporting having. I get the same results (including same "too many symbolic links" exception) on WSL2 w/Ubuntu 20.04.5 LTS (where the code is in the Linux VM, not traversing into the Windows mount), and on macOS Sonoma 14.3.1.

I'm not sure where to go from here - could try to detect circular symbolic links, but as the exception I got was different from the original reporters, I'm not sure this would fix their issue.

A minimal reproduction would go a long way 🙏

sunnysonx commented 6 months ago

@joshhunt hey, sure, attaching a reproduction repo: https://github.com/sunnysonx/glob-stream-8.0.1-symlink

After npm install, run npm run extract which runs the i18next-scanner.

If you get the same result, check find -type l to see all symlinks inside node_modules.

Added also console logs between 269 and 270 lines, but didn't get much information, as it printed 12k+ paths, and none of them seem to be suspicious.

But one strange thing, is that always the last row is from @material-ui, but maybe it actually crashed because of the path that should be the next one. image

WSL with ubuntu 22.*

joshhunt commented 6 months ago

Was able to reproduce this thanks to @sunnysonx's repro. I suspect this will happen to anyone with many large package.json dependencies.

@phated I think this may be due to now-and-later's mapSeries causing additional stack frames. Perhaps this issue has always existed, as a upper-bound on the number of files that can be walked, but the new walkdir implementation can only do significantly less, which can be easy to run into with node_modules directories.

I tried to look through the code to see if there was a glaring recursion issue but nothing immediately stood out.

Circular symlink issue still exists though and is totally on me. I'll try and get a fix for that in.

Karbust commented 6 months ago

I also get this error after upgrading to gulp@5.0.0.

This is my code:

function imageminify() {
    const imgStreamOfficialIcons = gulp.src('material/icons_official/*.png')
        .pipe(imagemin([
            optipng({ optimizationLevel: 7 })
        ]))
        .pipe(gulp.dest('material/icons_official_minified/'))

    const imgStreamUnofficialIcons = gulp.src('material/icons_unofficial/*.png')
        .pipe(imagemin([
            optipng({ optimizationLevel: 7 })
        ]))
        .pipe(gulp.dest('material/icons_unofficial_minified/'))

    fs.readdirSync('material/icons_skills/').forEach(file => {
        fs.renameSync('material/icons_skills/' + file, 'material/icons_skills/' + file.replace('.sub', ''))
    })

    const imgStreamSkillsIcons = gulp.src('material/icons_skills/*.png')
        .pipe(imagemin([
            optipng({ optimizationLevel: 7 })
        ]))
        .pipe(gulp.dest('material/icons_skills_minified/'))

    return merge(imgStreamOfficialIcons, imgStreamUnofficialIcons, imgStreamSkillsIcons)
}

Error:

RangeError: Maximum call stack size exceeded
    at String.replace (<anonymous>)
    at globParent (E:\Projetos\my_project\node_modules\.pnpm\glob-parent@6.0.2\node_modules\glob-parent\index.js:38:14)
    at EventEmitter.onPath (E:\Projetos\my_project\node_modules\.pnpm\glob-stream@8.0.1\node_modules\glob-stream\index.js:280:30)
    at EventEmitter.emit (node:events:514:28)
    at EventEmitter.emit (node:domain:551:15)
    at processDirents (E:\Projetos\my_project\node_modules\.pnpm\glob-stream@8.0.1\node_modules\glob-stream\index.js:88:10)
    at next (E:\Projetos\my_project\node_modules\.pnpm\now-and-later@3.0.0\node_modules\now-and-later\lib\mapSeries.js:43:5)
    at handler (E:\Projetos\my_project\node_modules\.pnpm\now-and-later@3.0.0\node_modules\now-and-later\lib\mapSeries.js:57:9)
    at f (E:\Projetos\my_project\node_modules\.pnpm\once@1.4.0\node_modules\once\once.js:25:25)
    at processDirents (E:\Projetos\my_project\node_modules\.pnpm\glob-stream@8.0.1\node_modules\glob-stream\index.js:113:7)

Rolling back to gulp@4.0.2 works without any issues.

I do have quite a lot of dependencies.

robertknight commented 6 months ago

I experienced this issue on macOS in a mixed Python / JS project which contained a hidden subdirectory (.ruff_cache/content/) with a very large number of files (3955).

onPath visiting /Users/robert/hypothesis/h/.ruff_cache/content/8a34b1f9ca86f0e2
... (and thousands of similar entries) ...
onPath visiting /Users/robert/hypothesis/h/.ruff_cache/content/8a8497ee676ad18a

[09:29:35] RangeError: Maximum call stack size exceeded
    at console.value (node:internal/console/constructor:308:13)
    at console.log (node:internal/console/constructor:379:26)
    at EventEmitter.onPath (/Users/robert/hypothesis/h/node_modules/glob-stream/index.js:270:13)
    at EventEmitter.emit (node:events:515:28)
    at EventEmitter.emit (node:domain:551:15)
    at processDirents (/Users/robert/hypothesis/h/node_modules/glob-stream/index.js:88:10)
    at next (/Users/robert/hypothesis/h/node_modules/now-and-later/lib/mapSeries.js:43:5)
    at handler (/Users/robert/hypothesis/h/node_modules/now-and-later/lib/mapSeries.js:57:9)
    at f (/Users/robert/hypothesis/h/node_modules/once/once.js:25:25)
    at processDirents (/Users/robert/hypothesis/h/node_modules/glob-stream/index.js:113:7)

The glob being passed to gulp.src has a prefix which shouldn't match the .ruff_cache directory here at all.

After removing this directory, the gulp.src function does run successfully, but it is quite slow as the project where the gulp project root is located contains a bunch of other build directories that contain a large number of files in total.

robertknight commented 6 months ago

Minimal reproduction of the issue mentioned in the previous comment:

Create a project with the following layout:

- package.json (created with `npm init -y` then `npm install gulp`)
- gulpfile.mjs
- src/foo.txt (content is not important)

gulpfile.mjs:

import gulp from "gulp";

gulp.task("copy-files", () => gulp.src("src/*.txt").pipe(gulp.dest("build/")));

Run gulp copy-files. The task succeeds. Then create thousands of files in a directory alongside src. For example using this Python script:

from pathlib import Path

for i in range(1, 4000):
    Path(f'tmp/{i}.txt').touch()

Then run gulp copy-files again. This time it fails with:

[09:41:49] Using gulpfile /private/tmp/gulp-test/gulpfile.mjs
[09:41:49] Starting 'copy-files'...
[09:41:49] 'copy-files' errored after 43 ms
[09:41:49] RangeError: Maximum call stack size exceeded
    at process.set [as domain] (node:domain:67:16)
    at Domain.enter (node:domain:322:35)
    at EventEmitter.emit (node:domain:550:10)
    at processDirents (/private/tmp/gulp-test/node_modules/glob-stream/index.js:88:10)
    at next (/private/tmp/gulp-test/node_modules/now-and-later/lib/mapSeries.js:43:5)
    at handler (/private/tmp/gulp-test/node_modules/now-and-later/lib/mapSeries.js:57:9)
    at f (/private/tmp/gulp-test/node_modules/once/once.js:25:25)
    at processDirents (/private/tmp/gulp-test/node_modules/glob-stream/index.js:113:7)
    at next (/private/tmp/gulp-test/node_modules/now-and-later/lib/mapSeries.js:43:5)
    at handler (/private/tmp/gulp-test/node_modules/now-and-later/lib/mapSeries.js:57:9)
rarous commented 5 months ago

I got the same error when my folder in monorepo contained thousands of JSON files. This folder is in another project and my globs are absolute and should not be candidates.