Closed CarlBoneri closed 7 years ago
@CarlBoneri great idea. When my tests fail, I sometimes do get warnings in R about zombie processes. However, I hesitate to build this in right away because I am worried about platform dependence. Here is the result of running fork.kill_zombies()
on Windows 7 with 32-bit R-devel (r73342, Rtools34.exe).
file1b7473216171.cpp:3:22: fatal error: sys/wait.h: No such file or directory
#include <sys/wait.h>
^
compilation terminated.
make: *** [file1b7473216171.o] Error 1
ERROR(s) during compilation: source code errors or compiler configuration errors!
Program source:
1: #include <R.h>
2:
3: #include <sys/wait.h>
4:
5: extern "C" {
6: void file1b7473216171 ( );
7: }
8:
9: void file1b7473216171 ( ) {
10: int wstat; while (waitpid(-1, &wstat, WNOHANG) > 0) {};
11: }
Error in compileCode(f, code, language, verbose) :
Compilation ERROR, function(s)/method(s) not created! file1b7473216171.cpp:3:22: fatal error: sys/wait.h: No such file or directory
#include <sys/wait.h>
^
compilation terminated.
make: *** [file1b7473216171.o] Error 1
In addition: Warning message:
Show Traceback
Rerun with Debug
Error in compileCode(f, code, language, verbose) :
Compilation ERROR, function(s)/method(s) not created! file1b7473216171.cpp:3:22: fatal error: sys/wait.h: No such file or directory
#include <sys/wait.h>
^
compilation terminated.
make: *** [file1b7473216171.o] Error 1
Not sure, but processx may be able to help too.
I had a look at the forums, beginning with this SO thread and the ones linked at the top. I have decided that I do not want drake
to be too opinionated on this topic. After all, the needs of a good cleanup depend highly on the parallel backend. What works for mclapply
will most certainly not work for SLURM. However, I have tried to address the issue in two ways.
parLapply
backend always cleans up, even if make()
fails.Given that drake
needs to work on all platforms and all parallel backends, I think this is the best we can do.
@CarlBoneri thank you for bringing up zombies.
No problem. I don't think there are zombie processes on Windows?
@CarlBoneri there are pseudo-zombie processes on Windows. It's not completely predictable, but it often happens when parLapply fails or the user assigns a second cluster object to the same object name (as in cl <- makeCluster(2); cl <- makeCluster(2)
.
In my experience, they always die with the parent process (hence the pseudo).
Zombies from a parLapply()
failure clean up easily with on.exit(stopCluster())
, which is why I was glad to see this issue on the tracker. I did not know the bit about cl <- makeCluster(2); cl <- makeCluster(2)
.
Cleanup might be easier with a function like with_cluster()
:
with_cluster <- function(cl, code){
withCallingHandlers(
code,
error = function(e){
parallel::stopCluster(cl)
stop(e)
}
)
}
See r-lib/withr#59. Thanks for the idea, @kendonB.
Not sure if you've had issues with zombie-processes being left behind on linux systems, but per our stackexchange thread, I thought this might be useful for the package. Note that it would require
inline
package to be installed.