paulmillr / readdirp

Recursive version of fs.readdir with streaming api.
https://paulmillr.com
MIT License
378 stars 51 forks source link

Fix #138 (dir entries getting droped due to race condition) #139

Closed broofa closed 4 years ago

broofa commented 4 years ago

Okay, so ignore my previous description - there's a bit more here than I had initially.

First, I've removed the setImmediate() call to fix the issue I reported in #138. However this re-broke the asyncIterator behavior. I'm pretty sure the problem there was in the _read() implementation, which was ignoring the return value of push(). Per the Node docs, when _read() is called you're supposed to push() data until push() returns false, then wait for _read() to be called again. Because the code was blindly push()ing with no regard for the return value, I think that caused the stream to drop out of "flowing" mode, effectively hanging the async iterator. (Admittedly, this at the limit of my understanding of how streams work, so I may not have that exactly right, but I that's my best guess at the issue. Also, I'm not sure how to explain the various tests people reported, mine included, just cleanly exiting rather than hanging. :-/ )

I'm pretty sure this misbehaving _read() is what was causing #91 and #97.

To fix this, I rewrote _read() to have a while-true loop that does two thngs: 1) "explore" the filesystem until it gets some entries to push, and 2) push() entries until it sees false. It only drops out of that loop when push() == false (or we run out of things to explore).

Other changes of note:

paulmillr commented 4 years ago

Can you measure RAM and CPU impact of your changes? Pick a directory with 100,000 files and more for this.

broofa commented 4 years ago

Added "examples/perf.js", which basically does the same thing as "examples/list.js", but logs process.memoryUsage() info every 500ms. Here's what I get when running 3x with this PR:

$ node perf.js
     0 rss=18.7M   heapTotal=4.7M    heapUsed=2.7M    external=0.8M
 35000 rss=44.8M   heapTotal=25.6M   heapUsed=8.3M    external=0.8M
 78000 rss=45.0M   heapTotal=25.9M   heapUsed=13.2M   external=0.8M
131000 rss=45.6M   heapTotal=25.9M   heapUsed=14.8M   external=0.8M
170000 rss=45.9M   heapTotal=26.2M   heapUsed=9.6M    external=0.8M
194496 rss=63.0M   heapTotal=42.9M   heapUsed=19.0M   external=0.8M
Processed 194496 files in 2376 msecs

$ node perf.js
     0 rss=18.6M   heapTotal=4.7M    heapUsed=2.7M    external=0.8M
 31000 rss=44.8M   heapTotal=25.6M   heapUsed=13.8M   external=0.8M
 69000 rss=45.0M   heapTotal=25.6M   heapUsed=10.8M   external=0.8M
118000 rss=45.3M   heapTotal=25.6M   heapUsed=15.3M   external=0.8M
162000 rss=45.5M   heapTotal=25.6M   heapUsed=11.3M   external=0.8M
194496 rss=62.6M   heapTotal=42.4M   heapUsed=19.2M   external=0.8M
Processed 194496 files in 2497 msecs

$ node perf.js
     0 rss=18.6M   heapTotal=4.7M    heapUsed=2.7M    external=0.8M
 28000 rss=44.6M   heapTotal=25.4M   heapUsed=10.2M   external=0.8M
 75000 rss=44.9M   heapTotal=25.4M   heapUsed=15.7M   external=0.8M
125000 rss=45.4M   heapTotal=25.9M   heapUsed=13.2M   external=0.8M
170000 rss=45.4M   heapTotal=25.9M   heapUsed=10.2M   external=0.8M
194496 rss=62.3M   heapTotal=42.7M   heapUsed=11.2M   external=0.8M
Processed 194496 files in 2325 msecs

And here's what I get with that same test, running against the current master:

$ node perf.js
     0 rss=18.7M   heapTotal=4.7M    heapUsed=2.7M    external=0.8M
 33000 rss=81.6M   heapTotal=54.5M   heapUsed=27.4M   external=0.8M
 91000 rss=106.0M  heapTotal=77.0M   heapUsed=48.0M   external=0.8M
151000 rss=88.8M   heapTotal=60.0M   heapUsed=38.4M   external=0.8M
194496 rss=99.6M   heapTotal=70.7M   heapUsed=41.3M   external=0.8M
Processed 194496 files in 1806 msecs

$ node perf.js
     0 rss=18.7M   heapTotal=4.7M    heapUsed=2.7M    external=0.8M
 32000 rss=79.3M   heapTotal=52.6M   heapUsed=25.4M   external=0.8M
100000 rss=88.9M   heapTotal=60.5M   heapUsed=24.7M   external=0.8M
168000 rss=95.9M   heapTotal=67.3M   heapUsed=38.3M   external=0.8M
194496 rss=101.7M  heapTotal=73.3M   heapUsed=53.5M   external=0.8M
Processed 194496 files in 1706 msecs

$ node perf.js
     0 rss=18.7M   heapTotal=4.7M    heapUsed=2.7M    external=0.8M
 29000 rss=77.6M   heapTotal=50.5M   heapUsed=23.4M   external=0.8M
 87000 rss=104.2M  heapTotal=75.2M   heapUsed=50.6M   external=0.8M
159000 rss=91.4M   heapTotal=62.1M   heapUsed=31.5M   external=0.8M
194496 rss=102.4M  heapTotal=72.8M   heapUsed=39.5M   external=0.8M
Processed 194496 files in 1823 msecs

So... looks like this PR is more memory efficient, but slightly less performant.

[Edit: BTW, I don't have an explanation for why it's more memory efficient. I wasn't expecting that, honestly. Maybe something to do with not having to carry the state needed for the setImmediate timers across event loop cycles? Dunno.]

paulmillr commented 4 years ago

@ronag what are your thoughts on that?

ronag commented 4 years ago

when _read() is called you're supposed to push() data until push() returns false, then wait for _read() to be called again. Because the code was blindly push()ing with no regard for the return value, I think that caused the stream to drop out of "flowing" mode, effectively hanging the async iterator.

I don't think this is correct. You should stop pushing if it it returns false, but it's just a hint. Also you can just push once and then _read will be called again.

What is the actual problem here?

ronag commented 4 years ago

Ah, didn't see the referred issues. That's going to take some time to dig into.

paulmillr commented 4 years ago

@ronag the main issue here was https://github.com/paulmillr/readdirp/issues/138: while using setInterval, readdirp was giving different results sometimes.

broofa commented 4 years ago

@paulmillr @ronag To clarify, the issue isn't the use of setInterval, per se. The issue is the race condition that arises from push()'ing to the Readable stream asynchronously (and ignoring the return value of push()). But, because it's a race condition, the issue manifests sporadically. So demonstrating the problem requires running multiple times until the behavior manifests. At least, on my laptop it does.