jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.72k stars 3.39k forks source link

`with_temporary_directory` does not remove temporary directory when Pandoc receives a TERM. #7355

Open odkr opened 3 years ago

odkr commented 3 years ago

It appears that with_temporary_directory does not remove temporary directories it has created when Pandoc receives a TERM. It also appears to not close the Lua state.

Given a filter do-nothing-but-with-a-tmpdir.lua:

pandoc.system.with_temporary_directory('signal-test', function (tmp_dir)
    print(tmp_dir)
        local now = os.clock()
    while os.clock() - now <= 10 do end
end)

Then running

{
    pandoc -L do-nothing-but-with-a-tmpdir.lua /dev/null & pandoc_pid=$!
    sleep 1
    kill -s TERM $pandoc_pid
} |
while read -r tmp_fname; do
    sleep 2
    pgrep pandoc || echo Pandoc has stopped.
    [ -e "$tmp_fname" ] && printf -- '%s: still exists!\n' "$tmp_fname"
done

prints

[WARNING] Could not deduce format from file extension Defaulting to markdown Pandoc has stopped. /private/var/folders/2p/_c1f87s146sfdg0pkl379jsw0000gp/T/signal-test-39f653ef39e42abe: still exists!

And given a filter taking-out-the-trash.lua:

t = setmetatable({}, {__gc = function () print "Collected!" end})
local now = os.clock()
while os.clock() - now <= 10 do end

Then running

pandoc -L taking-out-the-trash.lua /dev/null & pandoc_pid=$!
sleep 1
kill $pandoc_pid

does not print "Collected!". (The same goes foer sending an INT.)

I have tested the above with Pandoc v2.14 on macOS 11.4.

I am must admit, I am confused by how Pandoc handles signals. At first, I thought Pandoc does not handle signals at all. But sending an INT does not interrupt Pandoc (but does appear to prevent t from being garbage-collected). Also, regardless of what signal I send, Pandoc exits with status 0.

jgm commented 3 years ago

I think @tarleb will be able to comment on this.

tarleb commented 3 years ago

That's a tricky one, and I'm properly confused, too.

The main issue here is that we use bracket for resource handling, including the temp directory and the Lua interpreter state. It ensures that the resource is freed if an exception happens. However, the Haskell runtime system doesn't convert the SIGTERM into an exception, but just terminates the process right away.

Maybe installing a signal handler that converts the signal into an asynchronous exception would help, but that leads us to a different problem: As far as I can see, the Lua process itself cannot be interrupted, as pandoc hands control to an uninterruptible C API call. So the Lua computation would finish (if ever!) before handing control back to Haskell, which would then handle the exception. So we'd win nothing. It might be solvable by modifying the Lua bindings to make long running calls interruptible.

It might also be necessary to install an appropriate signal handler in Lua to prevent state corruption, but I'm not sure.

tarleb commented 3 years ago

On my system pandoc terminates with the non-zero error code 143, i.e., 128 + 15, showing that it was killed with a SIGTERM.

odkr commented 3 years ago

On my system pandoc terminates with the non-zero error code 143, i.e., 128 + 15, showing that it was killed with a SIGTERM.

I tried again and now I get right exit statuses, too. Apparently, I just was too tired yesterday and mixed that up somehow.

odkr commented 3 years ago

I had a, very brief, look at this. I don't understand what you mean when you say that the Lua process cannot be interrupted. I conjecture you don't mean "interrupted" in the sense of an interrupt requests/a signal, but that from the perspective of Haskell, anything that happens 'in' Lua is opaque. I gather the problem with bracket responding to an exception is that Lua isn't fully re-entrant? At any rate, am I mistaken that the unclosed Lua state could be adressed by simply closing the Lua state in Pandoc's exception handler it if it's still open? This wouldn't address the temporary directory, of course, but at least the __gc and __close metamethods would be called, right? (And people who write filters could use those to delete the temporary directory for the time being; it's not pretty, but from my limited understanding, a 'pretty' solution requires refactoring). Come to think of it, this is a good opportunity to thank the two of you for all the effort you put into this!

tarleb commented 3 years ago

"Cannot be interrupted" was meant as "ignores SIGINT", but I'm unsure if it's part of the problem or just a different problem that I stumbled upon while investigating this.

We already close the Lua state in the exception handler. The problem is that we don't have an exception, and we'd have to get SIGTERM to be treated as one.

Thanks for the nice words. Seeing it be used it in interesting ways, such as you did in pandoc-{quotes,zotxt}.lua, is a great motivator.