Thunder run on GPU cluster: parameter, benchmark, scratch

sunny1226 commented 5 years ago

Now we have a GPU cluster which contains 4X 4-2080Ti GPU nodes. The CPU is E5-2650(2x12 cores) and Physical Memory is 256G. We have used NVIDIA driver 410.93, CUDA-9.2, NCCL 2.4.2 for THUNDER. We are trying to run THUNDER on our GPU cluster. We have so many questions.

How should we specify processes and threads? It said that on a cluster, we should specify one node with one process, and the thread number set as the number of CPU cores. However, when we only want to use 2 nodes to run THUNDER, (and other two nodes were used by others), we found that mpirun -np 2 cannnot work. Would more process speed up the job running? Or only more threads speed up it? And also, please give us suggestions on threads setting if we want to run THUNDER on 2 nodes.
We want to run a benchmark to test if THUNDER installation have no problem. Now I used relion benchmark data (EMPAIR 10028 Ribosome, 51G, ~10K Particles). How long should such a dataset process by THUNDER on one of our nodes? I have also found there is a THUNDER-benchmark data on GitHub however I cannot download the data set. Should I use that dataset to run THUNDER benchmark?
Our cluster have SSD scrach on each machine, not shared. On Relion and cryosparc v2, we can set scrach dir on local scrach directory. However, I didn't find the place to set local scratch or open it. Could I use local scrach? If I could use, how could I use it?
I found THUNDER will copy my benchmark data to physical memory. However, when there are milions of particles, it will run out of physical memory and cause job failed (Happened on our old workstation with only 128G to run EMPAIR 10028 Particles). If I don't want to write particles into phsical memory, how could I do? Thanks!

Zarrathustra commented 5 years ago

For two nodes, we recommend putting two processes on the first node and one process on the second one.
Running speed depends on so much environment parameters, such as memory, GPUs, CPUs, network bandwidth, disk bandwidth. I am sorry that I can not give you an estimation. However, it is some benchmark tests in https://www.nature.com/articles/s41592-018-0223-8. Moreover, you can use https://github.com/thuem/THUNDER-demo-datasets as benchmarks. Download failure is due to lack of Github LFS service, https://git-lfs.github.com.
Sorry, currently, there is no local scratch support in THUNDER.
Currently, THUNDER will read all particles in physical memory. We are working the memory buffer system which will load particles into physical memory when needed. We hope to release this feature soon. A present solution to his "out of memory" issue is to use SWAP. By configuring SWAP to a larger partition, it will get this issue solved.

Best regards,

Mingxu

sunny1226 commented 5 years ago

Thanks. Besides, could thunder support LFS cluster?

Zarrathustra commented 5 years ago

Sure.

In Tsinghua, we use LFS and SLURM as job manager.

If you have any problem with running THUNDER using LFS job manager, please contact us.

thuem / THUNDER

Thunder run on GPU cluster: parameter, benchmark, scratch #14