Closed bpbond closed 4 months ago
I made the changes and updated them in GitHub!
Great!
FYI: best practice is to link the issue (this) and the commit or PR that fixes it. That way it's easy to understand the issue that prompted a code change, or the code change that fixed an issue. So for example:
This was fixed in af5de7a.
From PNNL's Ryan D.:
Hello @channel - I have seen an increased use of
job-array
with R and this is leading to scaling issues. This issue is thatjob-array
will run the requested task on N nodes. That in itself isn’t a problem, except that R is typically single threaded. So when someone runs their Rscript on 1 Node, it is running on 1 out of 48 cores. If you scale that Rscript to 30 Nodes, then that R script is running on 30 cores out of 1440 cores. Thats not very efficient and it isn’t helpful to other folks who wish to run their code on Compass.When people run job-array their intention is typically to “Run my script N times, each script runs by itself so it doesn’t need to talk to other scipts” or perhaps “I want to run 1 R computation task for each sub folder”
If the intention above fits your situation then I’d love it if folks took a look at this example below:
test.sbatch
:test.Rscript
:The above code will run an expensive operation across 3 nodes, using all their cpus, each CPU will run the Rscipt above, doing its own fib computation. Within the Rscript you will see
rank
. Rank is a # from 0-144 CPUs. Rscript will run on each cpu; meaning each CPU will individually run the fib computation. Each rank is independent (they dont talk to other ranks).This is the same outcome as the
--job-array
except that this example will use ALL the cores on 3 machines, whereas job-array uses 1 Core on N number of machines (job-array is not efficient with R script without more parallelization)