marbl / verkko

Telomere-to-telomere assembly of accurate long reads (PacBio HiFi, Oxford Nanopore Duplex, HERRO corrected Oxford Nanopore Simplex) and Oxford Nanopore ultra-long reads.
305 stars 30 forks source link

How to maximize the number of computing nodes #275

Closed changhan1110 closed 2 months ago

changhan1110 commented 3 months ago

Hi,

I am using a large server cluster (SGE) with more than 50 nodes to run Verkko computations. Each node is equipped with 72 CPUs and ample memory. I’m trying to figure out how to utilize all 50 nodes effectively.

I understand that specifying certain resource options, such as --ali-run, allows a node to maximize its computing power. However, these options don’t control the number of nodes used when the Verkko scheduler submits jobs.

My goal is to optimize resource usage and reduce computation time by fully utilizing all available nodes. Unfortunately, I’ve noticed that the Snakemake job scheduler often submits only a few jobs, even when there are enough nodes to handle them simultaneously. For instance, only two jobs are submitted for alignONT despite the availability of many nodes.

Thanks, Changhan

skoren commented 3 months ago

It's possible you're hitting a limit on the number of jobs because the partitioning isn't creating more than two jobs because you don't have that much ONT data. It is possible to decrease the reads per partition (see --split-bases/--split-reads flags) but honestly I don't think it's worth the effort. There's a cost to spinning up a job and loading the index so verkko is set up to partition the data to avoid being limited by that. We almost never tweak this and rely on the default job batching, I don't think you'd gain much in terms of speed.

changhan1110 commented 3 months ago

Thank you for the reply! It did not take too much time to run whole the process with my test dataset.

I still do not understand why all the multiple jobs cannot be submitted to all the available nodes. Even though my data is small, my alignONT jobs are more than 100, but snakemake allows only 2 job submission simultaneously.

skoren commented 3 months ago

It should submit more jobs, we routinely run verkko on our slurm cluster with hundreds of jobs submitted. The default verkko limit is set to 1000 jobs in slurm so unless that's been changed via command-line or something is wrong with the slurm config, it should submit all the jobs it can.

skoren commented 2 months ago

Idle, no local observation of verkko limiting number of jobs submitted unless snakemake explicitly told to reduce max.