Closed sina-mansour closed 2 years ago
I've changed the scripts to use \dev\shm
but that didn't result in any changes to execution speed, however, it resulted in a small increase in the memory usage though. It's most likely due to the efficiency of the scratch NVMe space; however, I'm guessing spartan's implementation of the RAM filesystem may not really be a RAM but just another NVMe storage (the increase in RAM usage was only 300MB which is a lot less than what it should have been if all data were to be stored on RAM).
On scratch:
$ seff 34468107
Job ID: 34468107
Cluster: spartan
User/Group: sina_mansour/punim1566
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 02:56:00
CPU Efficiency: 96.90% of 03:01:38 core-walltime
Job Wall-clock time: 03:01:38
Memory Utilized: 1.21 GB
Memory Efficiency: 7.59% of 16.00 GB
On /dev/shm
:
$ seff 35174739
Job ID: 35174739
Cluster: spartan
User/Group: sina_mansour/punim1566
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 02:55:01
CPU Efficiency: 97.11% of 03:00:13 core-walltime
Job Wall-clock time: 03:00:13
Memory Utilized: 1.52 GB
Memory Efficiency: 19.00% of 8.00 GB
This alternative implementation has the benefit of not requiring a specific quota (like scratch) and hence may provide a greater degree of parallelization.
I'll next try the /tmp
storage for comparison and finalize the decision of the temporary storage filesystem.
Using the /tmp
storage seems to achieve the same goal with relatively lower memory utilization:
Job ID: 35188531
Cluster: spartan
User/Group: sina_mansour/punim1566
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 02:54:17
CPU Efficiency: 96.92% of 02:59:49 core-walltime
Job Wall-clock time: 02:59:49
Memory Utilized: 1.00 GB
Memory Efficiency: 12.51% of 8.00 GB
Basically, there is no quota from scratch used, and neither is any extra memory usage due to a shared RAM disk, yet the execution time is approximately the same (even slightly better).
I'll keep this setting to be used. I'll additionally reduce the requested memory to 4GB as utilization is only at 1GB and a 3GB extra memory should give enough safety margin not to run into execution termination due to memory. This may give us an advantage in terms of the number of parallel jobs getting accepted.
After Rob's comment regarding potential speedup gain I've contacted spartan maintainers and apparently spartan supports some type of RAM filesystem. I will try substituting the use of scratch to use the RAM disk and check the speedup benefits.
To use the RAM filesystem, we can copy files to
/dev/shm
instead of the current scratch directory which should also resolve the issue with limited scratch space./dev/shm
is a private, per-job, RAM filesystem, that is cleared up upon job completion. The usage is counted against the job RAM.Another alternative is to use
/tmp
, which on the physical partition, resides on a local NVMe drive, which is also cleaned up at the end of the job, and is private per-job.I'll look further into it and see what's best.