marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Disk Space overload for prokaryote #784

Closed d-yarmosh closed 6 years ago

d-yarmosh commented 6 years ago

Tried assembling H37Ra Tuberculosis genome from a MinIon run using canu -p H37Ra.canu -d canu genomeSize=4.4m -nanopore-raw ./ useGrid=false gnuplotTested=true stopOnReadQuality=false

and Canu snapshot v1.6 +143 changes (r8555 2e77c97e8b0178aaf5f09f6aa6c98b8b4db6d6ae)

This generated at least 7TB of data before overloading my hard disk. Canu recommends 3 TB in general and more for more repetitive genomes. TB may be very repetitive, but compared to something like the human genome, I wouldn't expect this to present an issue. Am I wrong in that assumption? This is on CentOS 7.

skoren commented 6 years ago

7TB is definitely too much for a bacteria. Do you have the output from the canu run and the asm.report contents?

d-yarmosh commented 6 years ago

Unfortunately, I immediately deleted everything from the output to free up space as that is a shared resource.

skoren commented 6 years ago

Not much help we can provide without more information. Are you able to share the input reads? The FAQ has info on how to send them to us.

What was the input coverage? Also, in your command you have -nanopore-raw ./, the input should be fastq/fasta, I'm not sure what Canu will do if you input a directory instead without filenames.