raszi / node-tmp

Temporary file and directory creator for node.js
MIT License
736 stars 92 forks source link

fix #192: tmp must not exit the process on its own #193

Closed silkentrance closed 4 years ago

silkentrance commented 5 years ago

@raszi this fix requires a new release :)

~we might have to add information to the docs that tmp will install signal handlers for SIGINT, which basically will keep the application running unless the user installs sigint handlers on his own.~

I have added a guard so that tmp will detect any user installed sigint listeners, and if there are none, tmp will exit the process.

papandreou commented 5 years ago

Bump! After starting to use tmp (via an exceljs upgrade) my workers no longer shut down cleanly because the SIGINT handlers aren't run.

silkentrance commented 5 years ago

@papandreou this is more involved as it requires tmp to detect user installed listeners, and, based on that knowledge, it will either exit the process by itself or leave it to the user installed listeners.

papandreou commented 5 years ago

@silkentrance, ah okay. As far as I can tell, the code also doesn't handle:

TBH the approach with trying to take over those events seems very fiddly and risky, but I probably don't understand the problem well enough. I guess there is no alternative to that?

silkentrance commented 5 years ago

TBH the approach with trying to take over those events seems very fiddly and risky, but I probably don't understand the problem well enough. I guess there is no alternative to that?

@papandreou The existing code is a work around that was implemented for Jest. Jest uses sandboxes and the tmp package will thus be loaded multiple times and the listeners will be installed multiple times. In order to prevent this, tmp checks whether there already are any known exit/sigint listeners installed and will remove them.

  • SIGINT and exit listeners that get added after tmp's one
  • Existing listeners that get removed after tmp has registered its handlers

Since the handler can always get the existing list of process listeners when it is invoked, this does not actually pose any problem. I have implemented a first version of this already. There is one failing test which I currently cannot get my head around, but I will come up with a working and hopefully stable solution.

silkentrance commented 5 years ago

@papandreou maybe you want to have a look at the code (peer review wise) and tell me your thoughts/findings about this. Personally, I think that this should be working now, and the best news is that tmp will automatically terminate the process on SIGINT if there are no user installed listeners for that signal/event.

raszi commented 5 years ago

This is another good example how complicated it is to work with the listeners. Getting rid of those and just provide a function to remove the created files and directories is way simpler and then the library users could add that function anywhere where they feel like.

papandreou commented 5 years ago

@raszi, I’ll have to agree with that.

silkentrance commented 5 years ago

@raszi just providing a function for doing the actual cleanup will not work in a Jest/sandboxed environment.

And this is just what the safe listener stuff has been implemented for in the first place, a work around for something that Jest does and that we so far have not experienced with any other testing framework or application.

The basic gist of it is that the tmp package will be included multiple times, every time installing its exit/sigint handlers.

So we need a way for the user to circumvent that. Or, else, remove the exit/sigint handler stuff from the package.

That being said, with Jest's sandboxing, the tmp package will be instantiated multiple times. Each instance will have its own collection of garbage collectibles, e.g. files and folders, or we register these garbage collectibles at the process level, e.g. add a private process.__tmp.garbage property that can be recognised by all tmp instances. Which, however, might result in multi threading issues...

So, basically, there is no way to handle this at the user side, unless the user is willing to call upon the garbage collector after all tests have been run in a given Jest test suite, and, since Jest seems to be setting up the sandbox for each test, this is even more complicated.

What do you think, any ideas on how to solve that issue?

jknoxville commented 5 years ago

Thanks for the fix @silkentrance

@raszi If I understand the thread, using an alternative method such as user cleanup functions might be an improvement, but it's complicated and there are more discussions to be had.

However, at the moment this library is incompatible with any apps using custom signal handlers - which is quite a severe limitation. Unless there's something missing in this change, I think it would be best to merge it as is, as a fix for the current system, and then have that higher level discussion separately. Do you guys agree?

alubbe commented 5 years ago

We've switched exceljs over to use this library, which fixed a few outstanding bugs, but introduced this bug - what's the status of this PR? We'd love to see this or a version of it merged

silkentrance commented 4 years ago

@raszi please merge and publish a new version. this is needed ASAP.

silkentrance commented 4 years ago

@raszi as for the listeners, they are complicated, yea, but a little bit of extra code will make it smoothly working, especially the promise stuff.

ezze commented 4 years ago

@silkentrance, nice work! I just checked it, and it seems that asynchronous stuff in my own SIGINT listener works like a charm.

@raszi, waiting for it being merged and published as a patch to npm.

For those who don't want/can't to wait this is how it's possible to use this pull request right now:

$ yarn remove tmp
$ yarn add raszi/node-tmp#gh-192
silkentrance commented 4 years ago

I have not heard of @raszi in a while. I will merge this now to master so that you do not have to use the branch in your dependencies.

However, I am not able to make a release on npm.

mikalai-ramashka commented 4 years ago

@raszi please make a release on npm, at least 3m packages are broken for now

ferm10n commented 4 years ago

@silkentrance

just providing a function for doing the actual cleanup will not work in a Jest/sandboxed environment.

Why not? I think it's more than reasonable what @raszi suggested, having a cleanup function. There are MANY libraries and built-in libs that provide cleanup/disconnect functions, and leave it up to the library user to implement. Is it not possible to write something like this in a unit test spec file?:

process.on('SIGINT', () => tmp._garbageCollector()); // admittedly not very catchy, maybe .finish()?

That way the lib user handles the cleanup.

I'm kinda new to Jest and my involvement with tmp is not related to unit testing. But it's a pretty bad idea to have a sub-dependency like tmp.js handling how and when the entire process shuts down. I wasted a decent amount of time trying to hunt down a shutdown issue and was pretty shocked to find tmp as the culprit.

Maybe an alternative to having a cleanup function would be to only call _safely_install_sigint_listener and _safely_install_exit_listener if running in a jest environment? Maybe by checking env?

silkentrance commented 4 years ago

@ferm10n

The problem with jest is that it uses node sandboxes. And with tmp installing different process signal listeners so that the garbage gets collected and dumped out, this lead to the fact that these listeners had been installed multiple times, even hundreds of times, depending on the tests that were run, eventually leading to issue that too many event listeners had been installed, which in turn caused node to freak out and stop.

And with that sandboxing in place, the tmp module will be instantiated multiple times, so there is no way to get a hold on previously collected garbage, as all module internals will be reset. So there is actually no way around the current approach, unless one uses a clever file system storage based approach.The latter will increase the complexity of the solution by far, so this cannot be something that we want right now in order to stabilize it. And the solution was already stabilized, with the merge of the fix. It will require a deployment to npm, though, something which I am not able to do.

I hope that @raszi is still available and willing to make a new release. Since the last time I tried contacting @raszi was in Oct 2019, I am hoping that nothing happened, preventing @raszi from responding.

ferm10n commented 4 years ago

I'm still trying to understand why tmp would want to call process.exit itself, or why the user of the lib can't call a function to trigger the garbage collection (from within the jest sandbox)

Is there an example I could look at that demonstrates this sandboxing issue you describe here? https://github.com/raszi/node-tmp/pull/193#issuecomment-498381335 I'm really interested in it!

silkentrance commented 4 years ago

@ferm10n the reason for that is that SIGINT handlers are expected to terminate the process by calling process.exit(), see https://nodejs.org/api/process.html for more information. Here is an excerpt that explains why tmp currently (and in the past) will enforce the process exit while also attaching its SIGINT listerner last to the chain of available listener.

'SIGTERM' and 'SIGINT' have default handlers on non-Windows platforms that reset the terminal mode before exiting with code 128 + signal number. If one of these signals has a listener installed, its default behavior will be removed (Node.js will no longer exit).

And, AFAIK, this was the original behaviour of tmp before that I put my dirty fingers on it :) but I might be wrong. Memory fails over the years...

Regardless, the current approach is prone to failure, especially if the application itself installs a SIGINT listener that will exit upon its own, so the tmp listener might never be called.

I will rethink and I might already know a proper solution, based on your input

a) tmp never exits the process b) tmp registers its SIGINT handler as the first handler in the list of handlers

this, however, requires the application to move any files or dirs of interest and which are currently under "supervision" by tmp to a different location, in case that these must be kept.

also, the cleanup callback must be refactored so that it will not throw an exception if a file or directory is missing.

silkentrance commented 4 years ago

further discussed in #216