marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
660 stars 179 forks source link

Space required #2254

Closed sara77777777 closed 1 year ago

sara77777777 commented 1 year ago

Hello,

this is my command,

canu -p ONT -d ONT genomeSize=6.3g -nanopore /path/file.fastq.gz java=path/java batMemory=118 batThreads=30

I ran this command a month ago and the correction is still going, it has completed 113 jobs of 771 total. I’m trying to assemble a human genome from nanopore reads and I’d like to know how much other space it requires, it is already using 4 Tb and I’m finishing the available space.

canu version 2.2 I’m running this on Linux, I'm using one node.

thank you

skoren commented 1 year ago

The space varies depending on coverage and repetitiveness of the genome, the FAQ suggested at least 3TB for mammalian-sized genomes. There are some parameters to reduce disk usage on the FAQ: https://canu.readthedocs.io/en/latest/faq.html#my-assembly-is-running-out-of-space-is-too-slow you could use though that requires a restart.

One thing I notice in your parameters is you have set the genome size to 6.3gb. The ONT data isn't accurate enough to get haplotype-resolution on human so it doesn't make sense to set the size to 6gb. It should be 3gb which would reduce the amount of reads corrected and potentially speed up the run. If this is recent nanopore data/basecaller, you could also try the uncorrected ONT assembly following the docs at the end of: https://canu.readthedocs.io/en/latest/quick-start.html#correct-trim-and-assemble-manually. That will completely skip this correction step and should use less disk space as well.

sara77777777 commented 1 year ago

Thank you for quickly response, I'll use your suggestions and start the assembly again.