raszi / node-tmp

Temporary file and directory creator for node.js
MIT License
736 stars 92 forks source link

Crash during cleanup #168

Closed robinhouston closed 6 years ago

robinhouston commented 6 years ago

Operating System

NodeJS Version

Tmp Version

0.0.33

Expected Behavior

What have you expected tmp to do? Not crash on exit.

Experienced Behavior

What did actually happen? It crashed on exit.


The following script reliably exits with Killed: 9 for me:

const fs = require("fs"),
      https = require("https"),
      path = require("path"),

      tmp = require("tmp");

function makeRequest() {
    return new Promise(function(resolve, reject) {
        https.get("https://www.google.com", resolve);
    });
}

function createEmptyFile() {
    return new Promise(function(resolve, reject) {
        tmp.file(function(error, filename, fd) {
            if (error) return reject(error);

            let output = fs.createWriteStream(null, { fd });
            output.on("close", resolve);
            output.close();
        });
    });
}

createEmptyFile().then(makeRequest);

This is a stripped-down test case derived from a crash observed in kiln/flourish-sdk#40.

Running under a debugger (LLDB) reveals that the exception causing the process to be killed is EXC_GUARD during a close() call: see this Stack Overflow post for a little information about this exception. From the debugger stack trace I can see this is happening while running a callback during Node’s preparations for exit. I assume this is your _garbageCollector() callback.

Setting { keep: true } stops it from crashing, as you would expect if the problem is caused during cleanup.

Upgrading to the master version from GitHub also stops it from crashing. Unfortunately it’s not released. Also it isn’t clear to me which of the changes since 0.0.33 has stopped it from crashing, hence whether the underlying issue has been addressed.

robinhouston commented 6 years ago

I assume what is happening is that, after the allocated file descriptor is closed by output.close(), the web request makes a syscall that reuses the same fd as a “guarded” file descriptor. When the cleanup code comes to close the fd, it triggers the EXC_GUARD exception and crashes the node process.

robinhouston commented 6 years ago

This suggests that, at least on Mac OS, it is not safe to close a file descriptor that may have already been closed, because it could have been reused as a guarded fd. What’s the reason for closing the file before unlinking it? Is it a Windows thing?

robinhouston commented 6 years ago

Although it isn’t clear from the stripped-down test case, in the real app the file descriptor is being closed by third-party code, so there isn’t a straightforward way to just leave it open.

So maybe the lesson is that, if I am going to close the fd, I need to set keep: true and handle the cleanup myself? Even though the effect is unusually dramatic in this example (i.e. SIGKILL to the head), in general closing a fd that has been reused elsewhere might well cause chaos of one sort or another.

If that’s the case, it might be helpful to document it.

silkentrance commented 6 years ago

@robinhouston the current master has not yet been released. and i must wonder why, as it addresses a few (edge) issues. @raszi please publish a new version to npm from master.

robinhouston commented 6 years ago

@silkentrance Actually, I think the reason master doesn’t have this problem is that on master the cleanup code is not called at all! So it seems to be a bug in master, rather than an improvement.

Here’s a test script. With the master code (3c28d1cd5059405a1cba8e69c324d8e7dbab2224) I find that the file still exists after exit:

require("tmp").file(function(error, filename, fd) {
    console.log("Temp file", filename);
});

PS. Here is an even more stripped-down test case for the original problem:

require("tmp").file(function(error, filename, fd) {
    require("fs").closeSync(fd);
    require("https").get("https://www.google.com");
});
robinhouston commented 6 years ago

Okay, I guess { detachDescriptor: true } is the option I need. I’ll close this, and open a separate issue for the apparent failure of cleanup in the master branch.

robinhouston commented 6 years ago

Filed as #169.

dsagal commented 5 years ago

I'd like to reopen this.

The issue is serious. Closing fd returned by tmp.file should be documented as incorrect. When tmp calls close(fd) on process exit in this case, fd is likely to refer to a different file, and in some cases it will indeed crash the process with "Killed: 9".

The real problem is that this is horrendously difficult to debug. The program that was trying to exit with code 0, and got a SIGKILL at exit time, might be doing a lot of things -- it's taken me literally many hours of debugging to track it down to this module.

I propose the following:

  1. Documentation should say: do NOT close fd returned by tmp.file; if the fd is not needed, use detachDescriptor: true.
  2. When closing fd on exit, if you ever get EBADF, actually output a message to stderr that "fd returned from tmp.file() was incorrectly closed by the user".

I don't see a way to protect against such a mistake, but giving a message about it, annoying as it is, is far more helpful than silently ignoring until it turns into a crash.