paulmillr / readdirp

Recursive version of fs.readdir with streaming api.
https://paulmillr.com
MIT License
378 stars 51 forks source link

Inconsistent results when running same test repeatedly(?!?) #138

Closed broofa closed 4 years ago

broofa commented 4 years ago

[Environment: MacOSX Mojave, node v12.10.0, readdirp 3.2.0]

Running readdirp() periodically, exact same code, exact same directory... and I get different output. :-(

Here's the recipe:

# Start in a clean directory, install readdirp
mkdir temp && cd temp && npm i readdirp

# Create a directory with a few subdirectories
mkdir -p foo/bar/a foo/bar/b foo/blarg/a foo/blarg/b

# Create a test that scans the directory once/second
cat > test.js << EOT
const readdirp = require('readdirp');

setInterval(async function() {
  const results = await readdirp.promise('foo', {type: 'directories', depth: 1})
  console.log(results.map(r => r.path));
}, 1000);
EOT

# Run test.  Notice results are not consistent (!!!)
node test.js

Here's what I see in the console ...

[ 'bar', 'blarg', 'blarg/a', 'blarg/b' ]
[ 'bar', 'blarg', 'blarg/a', 'blarg/b' ]
[ 'bar', 'blarg' ]
[ 'bar', 'blarg', 'blarg/a', 'blarg/b' ]
[ 'bar', 'blarg', 'blarg/a', 'blarg/b' ]
[ 'bar', 'blarg', 'blarg/a', 'blarg/b' ]
[ 'bar', 'blarg' ]
[ 'bar', 'blarg' ]
[ 'bar', 'blarg', 'blarg/a', 'blarg/b' ]
... etc

Possibly related to #91?

broofa commented 4 years ago

Also related to #97?

paulmillr commented 4 years ago

If you want to track directory changes, you should use chokidar. It's based on readdirp and is very fast and cross-platform.

IO operations are inconsistent overall. So this may happen because file system doesn't write until some time; or caches writes; or anything else. These edge cases are usually handled in file watcher, like chokidar.

I think readdirp would always see the same results unless you use it as file watcher. For example, if you have a directory with 20k files, and readdirp is inconsistent, then it's clearly a bug and we should fix it. Otherwise, I won't consider this as a bug.

broofa commented 4 years ago

Please reopen. There is definitely a bug here - I'm still trying to diagnose the exact cause, but if you add a console.log to readdirpPromise's error handler here, you'll see that it's throwing "stream.push() after EOF" errors - i.e. it's dropping directory entries on the floor - but these are hidden because the promise resolves when the stream ends (when the EOF null is pushed to the stream), and subsequent calls to reject() are ignored.

broofa commented 4 years ago

I think the problem occurs when readdirp only detects directories during one pass of the JS event loop. Because detected [directory] entries are added via setImmediate, this.parents.length is still zero at the end of the do-while loop in _read(), and so the stream is ended prematurely.

broofa commented 4 years ago

Yeah, taking this.push(entry) out of setImmediate() fixes this problem, but it seems to break examples/list.js :-/

paulmillr commented 4 years ago

Try master now.

broofa commented 4 years ago

Nope, I'm still seeing the problem.

paulmillr commented 4 years ago

I don't really understand how taking push out of setimmediate helps, if I did basically the same thing in the commit.

broofa commented 4 years ago

Your commit simply suppresses the push() call if the stream is closed. I.e. it's still dropping the directory entry (that needs to be processed) on the floor... it's just doing it before Readable has a chance to complain about you trying to push to it after it's closed..

broofa commented 4 years ago

If you run my test script, above, are you seeing the results like I describe? I think the correct output should be "[ 'bar', 'blarg', 'blarg/a', 'blarg/b' ]" each time readdirp() runs. If there are any lines that show anything other than that (fewer items), then you're hitting the race condition that's causing entries to get dropped.

So... why are dirs added in a setImmediate() call? Clearly it breaks something given how examples/list.js breaks, but what's going on there?

paulmillr commented 4 years ago

Assuming it comes from the Stream implementation. Not 100% sure how it works! 😬 Node.js stream source code is piece of terrible and complex shit.

broofa commented 4 years ago

Hmm... If you switch the list.js example from using await in a for-of loop, to instead use the standard stream.on('data', chunk => {...}) event handling, it works fine. Something about the asyncIterator api of Readable is breaking that example... it seems like it doesn't finish the for-of loop, possibly because the stream is getting paused? Not sure.

ronag commented 4 years ago

@broofa Is this still a problem?

broofa commented 4 years ago

Yup, I think we're good. Thanks! (And any idea when this'll be npm publish'ed?)