jeroen / sys

Powerful replacements for base::system2
Other
106 stars 3 forks source link

Alternatives #2

Closed jeroen closed 6 years ago

jeroen commented 7 years ago
krlmlr commented 7 years ago
gaborcsardi commented 7 years ago

processx is really hacky, because it is just hard to do these things in R. E.g. getting the pid of the subprocess reliably was kind of a nightmare to implement. :) It has to start two extra shell processes to be able to get the pid. :/

This said, processx gives you non-blocking connections stdout and stderr and also automatic process cleanup. EDIT: also full command lines, should you need that.

Maybe it would make sense to implement processx on top of sys.

jeroen commented 7 years ago

I plan to add automatic process cleanup for background procs that are still running when R exits. Command line executions are simply wrappers that exec sh or cmd. Perhaps i will add those as well for convenience.

gaborcsardi commented 7 years ago

As for cleanup, in processx you can cleanup when an R object goes out of scope. A process is an R6 object there, so this is easy and sometimes convenient. E.g. in shinytest, we have an R and a headless web server instance running for each test file, and these are cleaned up at the end of the test file or block.

I am not saying that this needs to be in sys, probably not. OTOH I still think it would make sense to use sys in processx, for this cleanup, and also the non-blocking connections to background processes, which is also sg that I need elsewhere, and I think it is handy in general.

jeroen commented 7 years ago

Can you show some example code of what a non-blocking connection to a background proc looks like?

The obvious way is to direct output from the background proc to file(s) and have R read from that. Doing that fully in memory would be pretty tricky I think. You would need to run buffering functions in the R event loop that poll the stdout/stderr pipes and store it in some larger buffer...

The danger here is that when R is blocking the event loop while the background process is emitting output, the pipes can overflow. Linux buffers are only a few kb max, so if it is non blocking you must read them out. Perhaps I don't fully understand what you have in mind.

gaborcsardi commented 7 years ago

The obvious way is to direct output from the background proc to file(s) and have R read from that.

Yes, this is how it is implemented. In processx there is no other way, anyway.

gaborcsardi commented 7 years ago

Btw. this is how they solve this in PYthon: https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate Of course this assumes that you only communicate via stdin & stdout/stderr....

jeroen commented 7 years ago

Right, so doing Popen.wait() in R would be pretty risky because we cannot thread properly so you are quite likely to end up blocking your background process.

However Popen.communicate(input=None) says: ...Wait for process to terminate. So that simply turns it into a blocking process?

gaborcsardi commented 7 years ago

No, I think communicate is non-blocking, that's the key.

gaborcsardi commented 7 years ago

I mean, it does non-blocking I/O, and also quits if the process exits. That's how I understand it.

jeroen commented 7 years ago

I don't understand how it works then. How can something possibly be non blocking but still ensure that the output buffers get cleared before the background process can fill em up?

From how I read it, Popen.communicate() will block and keep reading stdout/stderr until the proc is done. Perhaps should give it a try :)

gaborcsardi commented 7 years ago

It is not a problem if the buffers fill up, then the process stops until they are emptied. communicate just reads and writes, whichever is possible. Until there is nothing to read and write.

On 20 Jan 2017 18:21, "Jeroen Ooms" notifications@github.com wrote:

I don't understand how it works then. How can something possibly be non blocking but still ensure that the output buffers get cleared before the background process can fill em up?

From how I read it, Popen.communicate() will block and keep reading stdout/stderr until the proc is done. Perhaps should give it a try :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jeroenooms/sys/issues/2#issuecomment-274142347, or mute the thread https://github.com/notifications/unsubscribe-auth/AAoTQJgmEyJuLWQUtA-3LwPukfvQ50l3ks5rUPswgaJpZM4LfdgG .

gaborcsardi commented 7 years ago

But I am not saying that we need this in sys, necessarily. I am fine with the temporary file solution in processx. It is much slower, but I don't really need high performance for my current use cases.

jeroen commented 7 years ago

communicate just reads and writes, whichever is possible. Until there is nothing to read and write.

Then I don't understand how it can avoid the deadlock as described in the python doc. If there is nothing to read/write anymore at some point, it doesn't mean that the the background proc won't emit any more output.

I think the file solution is the only sensible approach for background procs. If you would really want to pipe output from the background procs, you would need to spawn yet an additional thread/proc to constantly empty the pipe on the other end and store it in some resizable buffer, and then pipe that back to R. But that's very cumbersome in R.

gaborcsardi commented 7 years ago

If there is nothing to read/write anymore at some point, it doesn't mean that the the background proc won't emit any more output.

You can call it multiple times.

an additional thread/proc to constantly empty the pipe on the other end and store it in some resizable buffer,

There is nothing wrong with a filled buffer, the writer will just block on the next write, until the reader reads it out. communicate makes sure that there is no deadlock, by doing non-blocking reads and writes, whichever is possible.

wch commented 7 years ago

I've been looking at how to implement more reliable process cleanup. With processx, cleanup of child processes happens using reg.finalizer(). The child processes are killed when the R object handle is GC'd, but the problem is that if R is killed with a SIGTERM or SIGKILL, the finalizers don't run, and the processes hang around.

The solution that I have in mind is to create a supervisor or watchdog process. Here's a very simple way it could work: the first time that processx (or sys) starts a new process, it also launches the supervisor process, and tells it the pid of the child process. Every time processx starts a new child process, it tells the supervisor that pid. The supervisor simply polls to see if the parent R process is still alive; if not, it kills all the child processes.

A more sophisticated version could also handle I/O. Libuv seems like it could be good for this: it's purpose is to be an async I/O library, and it has cross-platform abstractions for process management and communication. (See the Processes section here.) In this version, R could launch the supervisor process, and every time it wants to start a child process, instead of R starting the child process, it tells the supervisor process to start the child. The children don't communicate with the R process directly; they communicate to the supervisor, and the supervisor communicates with R. The R package would use libuv to communicate to the supervisor, and the supervisor would use libuv to communicate with both the R process and the children. As with the simple version, the supervisor polls to see if the R process is running, and if not, it kills the child processes.

gaborcsardi commented 7 years ago

I think the supervisor is a good idea in general, but imo very hard to write a proper cross platform supervisor.

A more sophisticated version could also handle I/O. Libuv seems like it could be good for this:

What is the benefit of the R <-> libuv <-> child process I/O setup instead of just having R <-> child process?

Btw. if you have "access" to the child, then one solution is to open a pipe from the parent to the child, and the child can periodically check if the pipe has been closed. If yes, then it kills itself.

I am not sure if all this is worth the trouble, tough..... I would not worry too much about the child processes after a supposedly very rare SIGKILL....