npm / fstream

Advanced FS Streaming for Node
ISC License
208 stars 43 forks source link

fstream can execute syscalls after the stream emits `end` #26

Closed glasser closed 10 years ago

glasser commented 10 years ago

I have a script that uses the tar.Extract to write a tarball (from a Buffer in memory) to disk. It then immediately takes the extracted directory and moves it elsewhere (when the Extract emits end).

I occasionally end up with a situation where an uncaught ENOENT exception from a chmod or utimes gets thrown (generally when this is under load running lots of these operations in parallel). This is because the chmod is happening after the top-level end! (Note that these errors are emitted on the _fst field of the Extract and apparently can't be listened to by users of Extract.)

I've traced this down to situations where the _finish call on a DirWriter gets called twice. Specifically, it's possible for the emit of ready in DirWriter.prototype._create to lead to the entire entry being written (including the end()) synchronously. So the _process in DirWriter.prototype.end leads to me._finish() being invoked.

Then, while the stat in _finish is pending, the _process immediately after the emit("ready) in _create gets called, which calls _finish again.

Now you have two parallel _finish calls running. Let's say that permissions have to be changed. If you're unlucky, both stats will occur before either chmod occurs, and now both parallel "processes" think they need to call chmod. Now let's imagine that you end up unlucky and one of those processes manages to get all the way done its post-finish steps (chmod, etc) and even close the entire Writer... before all of the endProps operations have finished on the other process! Well, in that case you end up with a Writer (and say, a tar.Extract) that has emitted close when there's still some operations pending.

Unfortunately I have not been able to create any tests that consistently show this bug happening. And when I've tried to make small changes to fix it (say, making _finish return immediately in various places if it has already emitted end, basically by changing the done local to a field on the object), it seems to make examples/pipe.js fail (in ways that I don't understand).

For a workaround, when using tar.Extract, I've taken to adding

extract._fst.on('error', function (e) {
  if (endWasAlreadyEmittedOnExtract)
    return;
  throw e;  // or whatever
});

and that works for now, but is pretty ugly and brittle.

othiym23 commented 10 years ago

For writable streams (big sigh) remember that you want to listen on finish, not end, which is the event to listen on for readables. I know, I know. Does this problem still manifest if you change the event you're listening on?

glasser commented 10 years ago

It does not appear to me that either tar.Extract or its nested fstream emits a finish event.

glasser commented 10 years ago

(To be clear, I spent a while managing to get this to reproduce (in the context of a larger system under load) and printing lots of tracebacks and stuff as it did so. It seemed pretty clear to me that the code in _finish wasn't supposed to be executing twice for the same subdirectory but it was...)

othiym23 commented 10 years ago

@isaacs, do you mind weighing in on this? I'm still unfamiliar enough with how node-tar and fstream are intended to interact that I don't want to muddy the waters any further.

isaacs commented 10 years ago

Yeah, fstream is stuck in the streams1 era, unfortunately. And, in particular, the concept of a duplex stream wasn't quite well thought-out at that point.

It does look like tar.Extract is emitting "end" only when its embedded fstream emits a "close" event, which should only happen when all the files are closed and it's safe to mess with.

I haven't read and digested the entire OP, so take this for what it's worth (ie, not very much). Will review tomorrow.

isaacs commented 10 years ago

@glasser If you apply this patch in fstream does it fix the problem?

diff --git a/lib/writer.js b/lib/writer.js
index 5599fb2..8b1bbf9 100644
--- a/lib/writer.js
+++ b/lib/writer.js
@@ -246,6 +246,9 @@ function endUtimes (me, want, current, path, cb)
 Writer.prototype._finish = function () {
   var me = this

+  if (me._finishing) return
+  me._finishing = true
+
   // console.error(" W Finish", me._path, me.size)

   // set up all the things.

If it doesn't fix the problem, how does it affect it?

glasser commented 10 years ago

Looks like that fixes it.

isaacs commented 10 years ago

Awesome. This should prevent duplicated _finish calls, which seems like the root cause of the issue. I can't get it to throw in a situation like what you describe (a tarball with a subdir that deletes the target folder on the extract "end" event), but I can get it to try to run _finish multiple times, if I slow down some of the calls with a monkeypatch to fs.

Of course, a better solution would be to rewrite fstream using a better approach that was more streams3 compatible and more organized. But that's a rewrite and this is a 2-line fix ;)

glasser commented 10 years ago

Great! I couldn't figure out if the intended semantics were that _finish should only run once or not (esp since there's the whole part that saves the stat in _old and reuses something that it finds there), but if you say so then I definitely believe you :)

isaacs commented 10 years ago

Alright, tar@1.0.1 should pull in fstream@1.0.2 which has this fix.

The extract-move.js test added in node-tar triggers an ENOENT occasionally, but only very rarely. (In other words, I saw it twice last night, and can't reproduce now no matter how I set the timeout settings.)

My hope is that this, along with the changes in npm/lockfile#9, puts the final nail in npm's difficulties around ENOENT errors when installing massive package trees.

glasser commented 10 years ago

Great, thanks!