sina-mansour / UKB-connectomics

This repository will host scripts used to map structural and functional brain connectivity matrices for the UK biobank dataset.
https://www.biorxiv.org/content/10.1101/2023.03.10.532036v1
62 stars 7 forks source link

Using spartan RAM disk #34

Closed sina-mansour closed 2 years ago

sina-mansour commented 2 years ago

After Rob's comment regarding potential speedup gain I've contacted spartan maintainers and apparently spartan supports some type of RAM filesystem. I will try substituting the use of scratch to use the RAM disk and check the speedup benefits.

To use the RAM filesystem, we can copy files to /dev/shm instead of the current scratch directory which should also resolve the issue with limited scratch space.

/dev/shm is a private, per-job, RAM filesystem, that is cleared up upon job completion. The usage is counted against the job RAM.

Another alternative is to use /tmp, which on the physical partition, resides on a local NVMe drive, which is also cleaned up at the end of the job, and is private per-job.

I'll look further into it and see what's best.

sina-mansour commented 2 years ago

I've changed the scripts to use \dev\shm but that didn't result in any changes to execution speed, however, it resulted in a small increase in the memory usage though. It's most likely due to the efficiency of the scratch NVMe space; however, I'm guessing spartan's implementation of the RAM filesystem may not really be a RAM but just another NVMe storage (the increase in RAM usage was only 300MB which is a lot less than what it should have been if all data were to be stored on RAM).

On scratch:

$ seff 34468107
Job ID: 34468107
Cluster: spartan
User/Group: sina_mansour/punim1566
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 02:56:00
CPU Efficiency: 96.90% of 03:01:38 core-walltime
Job Wall-clock time: 03:01:38
Memory Utilized: 1.21 GB
Memory Efficiency: 7.59% of 16.00 GB

On /dev/shm:

$ seff 35174739
Job ID: 35174739
Cluster: spartan
User/Group: sina_mansour/punim1566
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 02:55:01
CPU Efficiency: 97.11% of 03:00:13 core-walltime
Job Wall-clock time: 03:00:13
Memory Utilized: 1.52 GB
Memory Efficiency: 19.00% of 8.00 GB

This alternative implementation has the benefit of not requiring a specific quota (like scratch) and hence may provide a greater degree of parallelization.

I'll next try the /tmp storage for comparison and finalize the decision of the temporary storage filesystem.

sina-mansour commented 2 years ago

Using the /tmp storage seems to achieve the same goal with relatively lower memory utilization:

Job ID: 35188531
Cluster: spartan
User/Group: sina_mansour/punim1566
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 02:54:17
CPU Efficiency: 96.92% of 02:59:49 core-walltime
Job Wall-clock time: 02:59:49
Memory Utilized: 1.00 GB
Memory Efficiency: 12.51% of 8.00 GB

Basically, there is no quota from scratch used, and neither is any extra memory usage due to a shared RAM disk, yet the execution time is approximately the same (even slightly better).

I'll keep this setting to be used. I'll additionally reduce the requested memory to 4GB as utilization is only at 1GB and a 3GB extra memory should give enough safety margin not to run into execution termination due to memory. This may give us an advantage in terms of the number of parallel jobs getting accepted.