tseemann / nullarbor

:floppy_disk: :page_with_curl: "Reads to report" for public health and clinical microbiology
GNU General Public License v2.0
134 stars 37 forks source link

inquiry of minimum computational requirements for running nullarbor #222

Closed tigerxu closed 5 years ago

tigerxu commented 5 years ago

Dear Tseemann,

I'm going to order a computational server to deal with bacterial genome sequencing data in the lab. Would you like to give me a hand on the minimum computational requirements (RAM; number of cpu cores; disk space) for running nullarbor smoothly?

Thanks very much! Zhuofei

ajmerritt1 commented 5 years ago

Hi Torsten, I'm also interested in this aspect of Nullarbor. Aside from your (i think) previous recommendation of 4Gb RAM per thread we are wondering if it is better to invest our budget in fewer fast cores or more slower cores. Zhuofei - our current nullarbor workstation is a 5 year old Dell T7600 with dual Zeon E5-2665s (16 cores, 32 threads total), 64Gb RAM, 4 X 2Tb 7200rpm drives in RAID6 and a 128gb SSD scratch drive. Without any use of prefill it runs a 40 sample listeria (2*300bp@>100x coverage) analysis in about 2 hours and 45 minutes.

Our new system will be something like dual 18 core Zeon (72 threads total), 768Gb RAM, 6 X 4TB 7200rpm drives in RAID6 for storage and two SSDs in RAID 1 for OS and scratch space. It will be carved up into 4 VMs for use by different labs (but not anticipating being in use by more that 2 labs at a time).

We will scale our CPU choice to bias more threads or more speed depending on where Nullarbor is strongest. I hope that helps somewhat!

Best regards

Adam

tigerxu commented 5 years ago

Hi Adam,

Very helpful details on computational resources! Our new system for nullarbor workstation will be similar to yours with 256 GB RAM.

Thanks a lot! Zhuofei

tseemann commented 5 years ago

Nullarbor can run on a wide variety of machines. By default it tries to use 8 cores per "job", and then just runs multiple jobs at once. You probably want 32 GB RAM per job, as Spades/Shovill can use lots of RAM with some genomes. That's where the 4 GB / core comes from (it's also a common HPC standard).