mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

foreach may export objects twice #200

Closed mschubert closed 4 years ago

mschubert commented 4 years ago

Originally reported by @benmarchi (#179)

Issue 4

That leads me to the final issue I'm facing, when foreach with a large exported object is called from within a function, the relationship between object size and worker success is altered. This behavior similarly depends on the location of the original R process.

R session on the same node as SLURM jobs

library(foreach)
library(clustermq)

clustermq::register_dopar_cmq(n_jobs = 8, memory = 50000, timeout=180, template = list(log_file= "~/%a.log"))

data <- rep(1, 100000000)

fun <- function(i, data) {
  temp <- sum(data)
  i
}

x = foreach(i = 1:100, .export=c("data")) %dopar% fun(i, data)

run <- function() {
  x = foreach(i = 1:100, .export=c("data")) %dopar% fun(i, data)
}

run()

This runs fine for either data <- rep(1, 100000000) or data <- rep(1, 300000000). However, I did notice something unexpected when looking at the worker memory usage. From the isolated foreach call, clustermq returns the following message:

Submitting 8 worker jobs (ID: 6793) ...
Running 100 calculations (2 objs/762.9 Mb common; 1 calls/chunk) ...
Master: [13.6s 96.9% CPU]; Worker: [avg 76.3% CPU, max 3283.5 Mb]

But, in the run() call the message is:

Submitting 8 worker jobs (ID: 6994) ...
Running 100 calculations (2 objs/762.9 Mb common; 1 calls/chunk) ...
Master: [8.0s 65.0% CPU]; Worker: [avg 91.1% CPU, max 1762.0 Mb]

So, when foreach is being called through a function, the memory required by each worker is reduced by half.