marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Memory not being configured at cor and cns stages on local machine with last commit #1165

Closed johnomics closed 5 years ago

johnomics commented 5 years ago

I'm running a metagenome assembly on a virtual machine with 96 cores, 360 GB RAM. We accidentally installed the latest commit of canu, commit https://github.com/marbl/canu/commit/a50e26a75ffccc529bd944b7adb291e2b6e1c24b, rather than v1.8. This is the command:

canu -p job -d job  genomeSize=250m corOutCoverage=all corMhapSensitivity=high corMinCoverage=0 -fast -nanopore-raw reads.fastq.gz

The cor and cns steps don't get configured properly:

--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl     24 GB    8 CPUs x  12 jobs   288 GB   96 CPUs  (k-mer counting)
-- Local: hap       12 GB   24 CPUs x   4 jobs    48 GB   96 CPUs  (read-to-haplotype assignment)
-- Local: cormhap   13 GB   16 CPUs x   6 jobs    78 GB   96 CPUs  (overlap detection with mhap)
-- Local: obtmhap   13 GB   16 CPUs x   6 jobs    78 GB   96 CPUs  (overlap detection with mhap)
-- Local: utgmhap   13 GB   16 CPUs x   6 jobs    78 GB   96 CPUs  (overlap detection with mhap)
-- Local: cor      --- GB    4 CPUs x   1 job    --- GB    4 CPUs  (read correction)
-- Local: ovb        4 GB    1 CPU  x  88 jobs   352 GB   88 CPUs  (overlap store bucketizer)
-- Local: ovs        8 GB    1 CPU  x  44 jobs   352 GB   44 CPUs  (overlap store sorting)
-- Local: red       10 GB    6 CPUs x  16 jobs   160 GB   96 CPUs  (read error detection)
-- Local: oea        4 GB    1 CPU  x  88 jobs   352 GB   88 CPUs  (overlap error adjustment)
-- Local: bat       64 GB    8 CPUs x   1 job     64 GB    8 CPUs  (contig construction with bogart)
-- Local: cns      --- GB    8 CPUs x   1 job    --- GB    8 CPUs  (consensus)
-- Local: gfa        8 GB    8 CPUs x   1 job      8 GB    8 CPUs  (GFA alignment and processing)

Restarting this job with canu 1.8 configures the cor job to use all 96 cores:

-- Local: cor        9 GB    4 CPUs x  24 jobs   216 GB   96 CPUs  (read correction)

This appears to be related to commit https://github.com/marbl/canu/commit/259f07aacafc7271b58ad994f1e7e1976295e880 - is there a bug here?

skoren commented 5 years ago

The change is that this step doesn't get configured until later, when you know the longest corrected read and thus the required memory. It should configure itself if you let it keep running.

johnomics commented 5 years ago

Thanks - unfortunately that wasn't happening for me; it just retained the 4 CPUs x 1 job configuration and was leaving the rest of the machine idle.

brianwalenz commented 5 years ago

That seems like a bug to me. I'll check it out, but we're starting a holiday weekend, so don't expect anything in the near future. You should be able to force it to do what you want with corMemory=14 corConcurrency=24 corThreads=4. Same idea for consensus.

johnomics commented 5 years ago

Thanks - no rush at all, just wanted to alert you to it in case it is a bug - we didn't mean to use this commit, should have been using 1.8 anyway! Happy holidays.

brianwalenz commented 5 years ago

Fixed! Thanks for reporting. I'm pretty sure I wouldn't have noticed it.