Closed jeroen closed 6 years ago
processx
is really hacky, because it is just hard to do these things in R. E.g. getting the pid of the subprocess reliably was kind of a nightmare to implement. :) It has to start two extra shell processes to be able to get the pid. :/
This said, processx gives you non-blocking connections stdout and stderr and also automatic process cleanup. EDIT: also full command lines, should you need that.
Maybe it would make sense to implement processx
on top of sys
.
I plan to add automatic process cleanup for background procs that are still running when R exits. Command line executions are simply wrappers that exec sh
or cmd
. Perhaps i will add those as well for convenience.
As for cleanup, in processx
you can cleanup when an R object goes out of scope. A process is an R6 object there, so this is easy and sometimes convenient. E.g. in shinytest
, we have an R and a headless web server instance running for each test file, and these are cleaned up at the end of the test file or block.
I am not saying that this needs to be in sys
, probably not. OTOH I still think it would make sense to use sys
in processx
, for this cleanup, and also the non-blocking connections to background processes, which is also sg that I need elsewhere, and I think it is handy in general.
Can you show some example code of what a non-blocking connection to a background proc looks like?
The obvious way is to direct output from the background proc to file(s) and have R read from that. Doing that fully in memory would be pretty tricky I think. You would need to run buffering functions in the R event loop that poll the stdout/stderr pipes and store it in some larger buffer...
The danger here is that when R is blocking the event loop while the background process is emitting output, the pipes can overflow. Linux buffers are only a few kb max, so if it is non blocking you must read them out. Perhaps I don't fully understand what you have in mind.
The obvious way is to direct output from the background proc to file(s) and have R read from that.
Yes, this is how it is implemented. In processx
there is no other way, anyway.
Btw. this is how they solve this in PYthon: https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate Of course this assumes that you only communicate via stdin & stdout/stderr....
Right, so doing Popen.wait()
in R would be pretty risky because we cannot thread properly so you are quite likely to end up blocking your background process.
However Popen.communicate(input=None)
says: ...Wait for process to terminate. So that simply turns it into a blocking process?
No, I think communicate
is non-blocking, that's the key.
I mean, it does non-blocking I/O, and also quits if the process exits. That's how I understand it.
I don't understand how it works then. How can something possibly be non blocking but still ensure that the output buffers get cleared before the background process can fill em up?
From how I read it, Popen.communicate()
will block and keep reading stdout/stderr until the proc is done. Perhaps should give it a try :)
It is not a problem if the buffers fill up, then the process stops until they are emptied. communicate just reads and writes, whichever is possible. Until there is nothing to read and write.
On 20 Jan 2017 18:21, "Jeroen Ooms" notifications@github.com wrote:
I don't understand how it works then. How can something possibly be non blocking but still ensure that the output buffers get cleared before the background process can fill em up?
From how I read it, Popen.communicate() will block and keep reading stdout/stderr until the proc is done. Perhaps should give it a try :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeroenooms/sys/issues/2#issuecomment-274142347, or mute the thread https://github.com/notifications/unsubscribe-auth/AAoTQJgmEyJuLWQUtA-3LwPukfvQ50l3ks5rUPswgaJpZM4LfdgG .
But I am not saying that we need this in sys
, necessarily. I am fine with the temporary file solution in processx
. It is much slower, but I don't really need high performance for my current use cases.
communicate just reads and writes, whichever is possible. Until there is nothing to read and write.
Then I don't understand how it can avoid the deadlock as described in the python doc. If there is nothing to read/write anymore at some point, it doesn't mean that the the background proc won't emit any more output.
I think the file solution is the only sensible approach for background procs. If you would really want to pipe output from the background procs, you would need to spawn yet an additional thread/proc to constantly empty the pipe on the other end and store it in some resizable buffer, and then pipe that back to R. But that's very cumbersome in R.
If there is nothing to read/write anymore at some point, it doesn't mean that the the background proc won't emit any more output.
You can call it multiple times.
an additional thread/proc to constantly empty the pipe on the other end and store it in some resizable buffer,
There is nothing wrong with a filled buffer, the writer will just block on the next write, until the reader reads it out. communicate
makes sure that there is no deadlock, by doing non-blocking reads and writes, whichever is possible.
I've been looking at how to implement more reliable process cleanup. With processx, cleanup of child processes happens using reg.finalizer()
. The child processes are killed when the R object handle is GC'd, but the problem is that if R is killed with a SIGTERM or SIGKILL, the finalizers don't run, and the processes hang around.
The solution that I have in mind is to create a supervisor or watchdog process. Here's a very simple way it could work: the first time that processx (or sys) starts a new process, it also launches the supervisor process, and tells it the pid of the child process. Every time processx starts a new child process, it tells the supervisor that pid. The supervisor simply polls to see if the parent R process is still alive; if not, it kills all the child processes.
A more sophisticated version could also handle I/O. Libuv seems like it could be good for this: it's purpose is to be an async I/O library, and it has cross-platform abstractions for process management and communication. (See the Processes section here.) In this version, R could launch the supervisor process, and every time it wants to start a child process, instead of R starting the child process, it tells the supervisor process to start the child. The children don't communicate with the R process directly; they communicate to the supervisor, and the supervisor communicates with R. The R package would use libuv to communicate to the supervisor, and the supervisor would use libuv to communicate with both the R process and the children. As with the simple version, the supervisor polls to see if the R process is running, and if not, it kills the child processes.
I think the supervisor is a good idea in general, but imo very hard to write a proper cross platform supervisor.
A more sophisticated version could also handle I/O. Libuv seems like it could be good for this:
What is the benefit of the R <-> libuv <-> child process I/O setup instead of just having R <-> child process?
Btw. if you have "access" to the child, then one solution is to open a pipe from the parent to the child, and the child can periodically check if the pipe has been closed. If yes, then it kills itself.
I am not sure if all this is worth the trouble, tough..... I would not worry too much about the child processes after a supposedly very rare SIGKILL....