Change parallelization approach

bpbond commented 4 months ago

From PNNL's Ryan D.:

Hello @channel - I have seen an increased use of job-array with R and this is leading to scaling issues. This issue is that job-array will run the requested task on N nodes. That in itself isn’t a problem, except that R is typically single threaded. So when someone runs their Rscript on 1 Node, it is running on 1 out of 48 cores. If you scale that Rscript to 30 Nodes, then that R script is running on 30 cores out of 1440 cores. Thats not very efficient and it isn’t helpful to other folks who wish to run their code on Compass.

When people run job-array their intention is typically to “Run my script N times, each script runs by itself so it doesn’t need to talk to other scipts” or perhaps “I want to run 1 R computation task for each sub folder”

If the intention above fits your situation then I’d love it if folks took a look at this example below:

test.sbatch :

#!/bin/bash
#SBATCH -A mscops
#SBATCH -N 3
#SBATCH -n 144
#SBATCH -t 00:10:00

. /etc/profile.d/modules.sh
module purge
module load gcc/11.3.0 openblas/0.3.26 r/4.4.0 pnnl_proxies/1.0

srun Rscript test.Rscript

test.Rscript :

# use rank variable for pathnames, such as /my/path/folder$rank
rank <- Sys.getenv("SLURM_PROCID")

fileConn<-file(paste("/home/dubi037/R_test/",rank,".log",sep=""))
writeLines(c("Hello","World"), fileConn)

# expensive fib computation
fib <- c(0, 1)
i <- 2
while (1) {
  fib <- c(fib, fib[i-1] + fib[i])
  i <- i + 1
  writeLines(c(paste(fib)), fileConn)
}

# sleep function
sleepStudy <- function(x)
{
        p1 <- proc.time()
        Sys.sleep(x)
        proc.time() - p1

}

# do the things
sum(fib[fib %% 2 == 0])
sleepStudy(360)

close(fileConn)

The above code will run an expensive operation across 3 nodes, using all their cpus, each CPU will run the Rscipt above, doing its own fib computation. Within the Rscript you will see rank. Rank is a # from 0-144 CPUs. Rscript will run on each cpu; meaning each CPU will individually run the fib computation. Each rank is independent (they dont talk to other ranks).

This is the same outcome as the --job-array except that this example will use ALL the cores on 3 machines, whereas job-array uses 1 Core on N number of machines (job-array is not efficient with R script without more parallelization)

w-alanna commented 4 months ago

I made the changes and updated them in GitHub!

bpbond commented 4 months ago

Great!

FYI: best practice is to link the issue (this) and the commit or PR that fixes it. That way it's easy to understand the issue that prompted a code change, or the code change that fixed an issue. So for example:

This was fixed in af5de7a.

w-alanna / COMPASS-sensor-data

Change parallelization approach #16