raszi / node-tmp

Temporary file and directory creator for node.js
MIT License
736 stars 92 forks source link

Asynchronous Cleanup Callback #155

Closed freitagbr closed 5 years ago

freitagbr commented 6 years ago

Operating System

NodeJS Version

Tmp Version

v0.0.33

Expected Behavior

The cleanup callback provided by tmp.dir and tmp.dirSync removes the temp directory asynchronously.

Experienced Behavior

The cleanup callback provided by tmp.dir and tmp.dirSync removes the temp directory synchronously. It would be awesome to have an option to make the cleanup callback asynchronous, so that removing directories with a lot of contents doesn't block.

silkentrance commented 6 years ago

@freitagbr would you require a callback for notification on when tmp is done removing the temporary objects?

Oh, wait, there is already the next callback that can be passed to the cleanupCallback. So you are free to use that if you like, as it is optional.

silkentrance commented 6 years ago

@freitagbr the problem, though, is that even on process.nextTick, the operation will still be blocking.

How do you envision tmp to be removing such large directories? And is it even an issue, considering that tmp will be removing these files on process exit only or whenever the cleanup callback is being called.

What is your actual use case here? Why do you keep that many temporary files in a single location so that it will stall the operation of your application?

freitagbr commented 6 years ago

My use case is that I'm creating temp directories as a sandbox to unzip .zip files (potentially very large, say 100GB+), manipulate the contained files, repackage them, and then send them off somewhere else.

Sure, if the cleanup callback is deferred with process.nextTick, that will not be an improvement, because fs.unlinkSync is still synchronous. My idea is to implement an asynchronous cleanup function that uses fs.unlink and fs.rmdir, so that cleaning up a large amount of data does not block. rimraf has a thorough implementation as an example.

silkentrance commented 6 years ago

This is even more involved as we also install a process.exit listener to handle graceful cleanups whenever a process exists or gets terminated.

Inside these handlers we cannot run any asynchronous code. Yet another problem to be solved, yeah.

freitagbr commented 6 years ago

I see that _prepareTmpDirRemoveCallback adds the cleanup callback to _removeObjects (which are cleaned up in the process.exit handler), and returns the cleanup callback to the user. Perhaps, in an async mode, an async cleanup callback is returned to the user, but the sync cleanup callback is still added to _removeObjects. This handles a process.exit occurring in the middle of the cleanup, allowing the remnants to be removed synchronously. The async cleanup callback would have to remove the sync callback from _removeObjects at the end of its operation.

silkentrance commented 6 years ago

@freitagbr thank you very much for your invaluable input. in #161 I do exactly what you proposed and it works like a charm. However, we still need to reduce the extra amount of code and by that I mean we have to just eliminate it by using rimraf. Not for files, though, as these still need to be closed in case of an open file descriptor.