Open mihaiconstantin opened 1 day ago
AFAIR, this is the best pattern for interrupting a session:
session$interrupt()
session$poll_io(2000)
session$read()
You could use a different timeout in the poll, and you should also check that the poll_io()
returned the expected result, because R sessions are not always interruptible.
As for your specific issue, I suspect that the problem is that parallel's subprocesses use the same console as the callr subprocess, so when you call ps_interrupt()
on them, all of them are interrupted. At least, this almost works for me, except that (I think) the callr subprocess receives some extra interrupts:
# Start a permanent session.
session <- callr::r_session$new()
# Create a cluster in the session.
invisible(session$run(function() {
cluster <<- parallel::makeCluster(2, type = "PSOCK")
}))
# Get the worker PIDs.
worker_pids <- session$run(function() {
parallel::clusterCall(cluster, Sys.getpid)
})
# Get handles to the worker processes.
worker_handles <- lapply(worker_pids, function(pid) {
return(ps::ps_handle(pid))
})
# Keep the session busy (i.e., but not the workers.)
session$call(function() {
while (TRUE) { Sys.sleep(0.1) }
})
# Allow some time for the call to kick in.
Sys.sleep(0.25)
# Get the state (i.e., expect `busy`).
cat(paste0("Session state before interrupt: ", session$get_state()), "\n\n")
# Interrupt the session.
session$interrupt()
print(session$poll_io(10000))
# Get the state (i.e., expect `busy`).
cat(paste0("\nSession state after interrupt: ", session$get_state()), "\n\n")
# Read the interrupt result (i.e., error).
cat("\n", rep("-", 25), "\n")
cat(paste0("Session result after session interrupt:\n"))
session$read()
cat(rep("-", 25), "\n\n")
# Get the state (i.e., expect `idle`).
cat(paste0("Session state after reading the interrupt result: ", session$get_state()), "\n\n")
# Manually propagate the interrupt to the cluster workers.
lapply(worker_handles, function(handle) {
tryCatch(
expr = {
# Interrupt the process.
ps::ps_interrupt(p = handle, ctrl_c = TRUE)
# Return some informative message.
return(paste0("Interrupted worker `", ps::ps_pid(handle), "`."))
},
error = function(e) {
# Return some informative message.
return(paste0("Failed to interrupt worker `", ps::ps_pid(handle), "`."))
}
)
})
# Get the state (i.e., expect `idle`).
cat(paste0("Session state after interrupting the workers: ", session$get_state()), "\n\n")
# Sys.sleep(0.1)
# Run something in the background session.
message("running")
session$run(function() {
print("Session `run` output.")
})
cat("\n")
# Verify that the workers are still alive.
session$run(function() {
parallel::clusterEvalQ(cluster, {
print(paste0("Worker `", Sys.getpid(), "` is alive."))
})
})
# Close the session later.
session$close()
Thanks a lot for your answer!
You could use a different timeout in the poll, and you should also check that the
poll_io()
returned the expected result, because R sessions are not always interruptible.
This is very helpful to know.
At least, this almost works for me, except that (I think) the callr subprocess receives some extra interrupts:
I was still not able to get it to work on Windows, however your other sentence gave me an idea.
As for your specific issue, I suspect that the problem is that parallel's subprocesses use the same console as the callr subprocess, so when you call
ps_interrupt()
on them, all of them are interrupted.
This made me wonder if the order in which the interrupts are sent matters. I am not entirely sure why, but if I interrupt the parallel
processes (i.e., ps::ps_interrupt(p = handle, ctrl_c = TRUE)
) before the session
interrupt (i.e., session$interrupt()
), the script seems to output as expected.
Brief Description
My sincere apologies if this issue doesn’t fit here, but I have run out of ideas of things to try.
In short:
example.R
(i.e., see below) onWindows
usingRscript --vanilla example.R
causes it to returnNULL
instead of"Session `run` output.
and then hang. Sometimes, the lastsession$run
call is executed, but not always.Sys.sleep(0.1)
before the second-to-lastsession$run
, the output seems to match what I would expect, but this feels arbitrary.ps::ps_interrupt(p = handle, ctrl_c = TRUE)
because, after commenting it out, the script outputs as expected.I would greatly appreciate your help in trying to understand what is happening...
Contents of
example.R
:Observed Output
On
macOS
(i.e., as expected):On
Windows
(i.e., hanging):