tudo-r / BatchJobs

BatchJobs: Batch computing with R
Other
85 stars 20 forks source link

OS X: makeClusterFunctionsMulticore() gives Error in if (worker$status$n.jobs >= worker$max.jobs) return("J"): missing value where TRUE/FALSE needed #117

Open HenrikBengtsson opened 8 years ago

HenrikBengtsson commented 8 years ago

Problem

With BatchJobs 1.6 in a fresh R 3.2.3 Patched session on OS X 10.8.5, the multicore cluster-function setup gives the following error:

library("BatchJobs")

Loading required package: BBmisc
Sourcing configuration file: '/Users/hb/Library/R/3.2/library/BatchJobs/etc/BatchJobs_global_config.R'
BatchJobs configuration:
  cluster functions: Interactive
  mail.from:
  mail.to:
  mail.start: none
  mail.done: none
  mail.error: none
  default.resources:
  debug: FALSE
  raise.warnings: FALSE
  staged.queries: TRUE
  max.concurrent.jobs: Inf
  fs.timeout: NA

> debugMulticore()
*** System info: ***
                                                                                           sysname
                                                                                          "Darwin"
                                                                                           release
                                                                                          "12.6.0"
                                                                                           version
"Darwin Kernel Version 12.6.0: Wed Mar 18 16:23:48 PDT 2015; root:xnu-2050.48.19~1/RELEASE_X86_64"
                                                                                          nodename
                                                                           "udp046761uds.ucsf.edu"
                                                                                           machine
                                                                                          "x86_64"
                                                                                             login
                                                                                              "hb"
                                                                                              user
                                                                                              "hb"
                                                                                    effective_user
                                                                                              "hb"

*** which R: ***
OS cmd: which R
OS result:
$exit.code
[1] 0

$output
[1] "/usr/bin/R"

which R result: /usr/bin/R

*** Find helper script: ***
OS cmd: /Library/Frameworks/R.framework/Resources/bin/Rscript --no-save --no-restore --no-init-file --no-site-file -e "message(normalizePath(system.file(\"bin/linux-helper\", package = \"BatchJobs\")))"
OS result:
$exit.code
[1] 0

$output
[1] "/Users/hb/Library/R/3.2/library/BatchJobs/bin/linux-helper"

Find helper script result: /Users/hb/Library/R/3.2/library/BatchJobs/bin/linux-helper

*** Auto-detecting ncpus: ***
OS cmd: /Users/hb/Library/R/3.2/library/BatchJobs/bin/linux-helper number-of-cpus
OS result:
$exit.code
[1] 0

$output
[1] "8"

Auto-detecting ncpus result: 8

*** Query worker status: ***
OS cmd: /Users/hb/Library/R/3.2/library/BatchJobs/bin/linux-helper status ""
OS result:
$exit.code
[1] 0

$output
[1] "dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid"
[2] "1.05 5 0 0"                                                                                         

Query worker status result: dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgidQuery worker status result: 1.05 5 0 0

*** Submitting 1 job: ***
OS cmd: /Library/Frameworks/R.framework/Resources/bin/Rscript --no-save --no-restore --no-init-file --no-site-file -e "message(normalizePath(system.file(\"bin/linux-helper\", package = \"BatchJobs\")))"
OS result:
$exit.code
[1] 0

$output
[1] "/Users/hb/Library/R/3.2/library/BatchJobs/bin/linux-helper"

Creating dir: /var/folders/r5/97tkfmg16qq_j7s_q1dgqqs40000gq/T//RtmpU9DbV8/file15e517e243f05
Saving registry: /var/folders/r5/97tkfmg16qq_j7s_q1dgqqs40000gq/T//RtmpU9DbV8/file15e517e243f05/registry.RData
Adding 1 jobs to DB.
OS cmd: /Users/hb/Library/R/3.2/library/BatchJobs/bin/linux-helper list-jobs "/var/folders/r5/97tkfmg16qq_j7s_q1dgqqs40000gq/T//RtmpU9DbV8/file15e517e243f05"
OS result:
$exit.code
[1] 0

$output
[1] "dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid"

Saving conf: /var/folders/r5/97tkfmg16qq_j7s_q1dgqqs40000gq/T//RtmpU9DbV8/file15e517e243f05/conf.RData
Submitting 1 chunks / 1 jobs.
Cluster functions: Multicore.
Auto-mailer settings: start=none, done=none, error=none.
Writing 1 R scripts...
SubmitJobs |+                                                |   0% (00:00:00)OS cmd: /Users/hb/Library/R/3.2/library/BatchJobs/bin/linux-helper status "/var/folders/r5/97tkfmg16qq_j7s_q1dgqqs40000gq/T//RtmpU9DbV8/file15e517e243f05"
OS result:
$exit.code
[1] 0

$output
[1] "dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid"
[2] "1.05 5 0 0"                                                                                         

Error in if (worker$status$n.jobs >= worker$max.jobs) return("J") :
  missing value where TRUE/FALSE needed
In addition: Warning message:
In as.list(as.numeric(strsplit(res, " +")[[1L]])) :
  NAs introduced by coercion
Sending 0 submit messages...
Might take some time, do not interrupt this!
>

Troubleshooting

With debug(BatchJobs:::getWorkerSchedulerStatus) I find that:

Browse[2]>
debug: if (worker$status$n.jobs >= worker$max.jobs) return("J")
Browse[2]> str(worker$status)
List of 14
 $ load       : num NA
 $ n.rprocs   : num NA
 $ n.rprocs.50: num NA
 $ n.jobs     : num NA
 $ NA         : num NA
 $ NA         : num NA
 $ NA         : num NA
 $ NA         : num NA
 $ NA         : num NA
 $ NA         : num NA
 $ NA         : num NA
 $ NA         : num NA
 $ NA         : num NA
 $ NA         : num NA
Warning message:
In as.list(as.numeric(strsplit(res, " +")[[1L]])) :
  NAs introduced by coercion
Browse[2]> str(worker$max.jobs)
 int 7

Next, inspecting BatchJobs:::getWorkerStatus.WorkerLinux();

> BatchJobs:::getWorkerStatus.WorkerLinux
function (worker, file.dir)
{
    res = runWorkerCommand(worker, "status", file.dir)
    setNames(as.list(as.numeric(strsplit(res, " +")[[1L]])),
        c("load", "n.rprocs", "n.rprocs.50", "n.jobs"))
}
<bytecode: 0x102188320>
<environment: namespace:BatchJobs>

lead me to the following:

> worker <- list(script=system.file("bin", "linux-helper", package="BatchJobs"), ssh=FALSE)
> res <- BatchJobs:::runWorkerCommand(worker, "status", file.dir)
> res
[1] "dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid"
[2] "1.30 6 0 0" 

Note that first output line, which is the same as reported by the above debug output.

The problem is that extra output line and that BatchJobs:::getWorkerStatus.WorkerLinux() tries to parse it when the information of interest is really in the following line.

Session information

> sessionInfo()
R version 3.2.3 Patched (2016-01-12 r69945)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
Running under: OS X 10.8.5 (Mountain Lion)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] backports_1.0.2 magrittr_1.5    sendmailR_1.2-1 parallel_3.2.3
 [5] DBI_0.4         tools_3.2.3     base64enc_0.1-4 BBmisc_1.9
 [9] stringi_1.0-1   RSQLite_1.0.0   fail_1.3        checkmate_1.7.4
[13] stringr_1.0.0   digest_0.6.9    brew_1.0-6      BatchJobs_1.6
HenrikBengtsson commented 8 years ago

A minimal reproducible example causing that output:

> script <- system.file("bin", "linux-helper", package="BatchJobs")
> system2(script, args="list-jobs", stdout=TRUE, stderr=TRUE)
[1] "dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid"

It is a message sent to standard error;

> system2(script, args="list-jobs", stdout=TRUE, stderr=FALSE)
character(0)
HenrikBengtsson commented 8 years ago

It looks like something is setting system environment variable DYLD_FALLBACK_LIBRARY_PATH when R is loaded, because it's not there prior to launching R:

{hb}: echo $DYLD_FALLBACK_LIBRARY_PATH

{hb}: R --vanilla
> Sys.getenv("DYLD_FALLBACK_LIBRARY_PATH")
[1] "/Library/Frameworks/R.framework/Resources/lib"

Unsetting this (e.g. https://discussions.apple.com/thread/4143805?tstart=0), makes that stderr message disappear:

> Sys.unsetenv("DYLD_FALLBACK_LIBRARY_PATH")
> script <- system.file("bin", "linux-helper", package="BatchJobs")
> system2(script, args="list-jobs", stdout=TRUE, stderr=TRUE)
character(0)

This also makes debugMulticore() work, i.e.

> Sys.unsetenv("DYLD_FALLBACK_LIBRARY_PATH")
> library("BatchJobs")
> debugMulticore()
[...]
Writing 1 R scripts...
SubmitJobs |+++++++++++++++++++++++++++++++++++++++++++++++++| 100% (00:00:00)
Sending 1 submit messages...
Might take some time, do not interrupt this!
Syncing registry ...
Submitting 1 job result: 1
Query worker status:
Query worker status result: 1.14 4 0 0
[...]
Query worker status result: 1.05 4 0 0
Syncing registry ...
OS cmd: /Users/hb/Library/R/3.2/library/BatchJobs/bin/linux-helper list-jobs "/var/folders/r5/97tkfmg16qq_j7s_q1dgqqs40000gq/T//RtmpqvRV6F/file162ed57d08b85"
OS result:
$exit.code
[1] 0

$output
character(0)

Running jobs:

>

Now, I don't know if it's safe to unset DYLD_FALLBACK_LIBRARY_PATH. Also, it could be that this is an OS X 10.8.x issue (which is an old OS version by now).