marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
657 stars 179 forks source link

stoponlowcoverage question #1795

Closed zhanwen-cheng closed 4 years ago

zhanwen-cheng commented 4 years ago

Hi, I am running Canu(V2.2) on my Ubuntu system with the following command with my nanopore sequenced meta-virus DNA(concentrated from the environment): nohup canu -p market -d . genomeSize=40k -nanopore-raw ~/CZW_disk/groundwater_nanopore/market/fastq_pass.fq.gz > market.log Actually some reference reported that viral genome is around 40-60kb and less than 100kb, thus I setted the genomeSize to 40k.However canu failed to assemble the contigs and with got from the 40k.txt.log log file -- -- ERROR: Read coverage (0.8) lower than allowed. -- ERROR: stopOnLowCoverage = 10 -- ERROR: -- ERROR: This could be caused by an incorrect genomeSize or poor -- ERROR: quality reads that cound not be sufficiently corrected. -- ERROR: -- ERROR: You can force Canu to continue by decreasing parameter -- ERROR: stopOnLowCoverage (and possibly minInputCoverage too). -- ERROR: Be warned that the quality of corrected reads and/or -- ERROR: contiguity of contigs will be poor. So I added the stopOnLowCoverage=0.1 and minInputCoverage=0.1 to the command as following: nohup canu -p market -d . stopOnLowCoverage=0.1 minInputCoverage=0.1 genomeSize=40k -nanopore-raw ~/CZW_disk/groundwater_nanopore/market/fastq_pass.fq.gz > market.log and it still reported the error as from stoponlow.txt.log -- -- ERROR: Read coverage (0) lower than allowed. -- ERROR: stopOnLowCoverage = 0.1 -- ERROR: -- ERROR: This could be caused by an incorrect genomeSize or poor -- ERROR: quality reads that cound not be sufficiently corrected. -- ERROR: -- ERROR: You can force Canu to continue by decreasing parameter -- ERROR: stopOnLowCoverage (and possibly minInputCoverage too). -- ERROR: Be warned that the quality of corrected reads and/or -- ERROR: contiguity of contigs will be poor.

If I changed genomeSize to 100k and canu could work fluently to the end witout any errors(.log file also pasted here 100k.txt.log ). What should I do if I still want to set the genomeSize to 40k? Will the assembled results differ a lot between different 40k and 100k genomesize?

####################################################################################### Another question I have is about the minlength used for assembling. When I succeeded in genomeSize 100k, I found seqStore.sh file contain minlength information at the end of file 100k.txt.log `/home/chengzw/software/canu/build/bin/sqStoreCreate \ -o ./market.seqStore.BUILDING \ -minlength 1000 \ -genomesize 1000000 \ -coverage 200 \ -bias 0 \ -raw -nanopore fastq_pass /home/chengzw/CZW_disk/groundwater_nanopore/market/fastq_pass.fq.gz \ && \ mv ./market.seqStore.BUILDING ./market.seqStore \ && \ exit 0

exit 1`

Does minlength means the minimum reads length used for canu or minimum output length of canu?

Thanks for your contribution for this software!

brianwalenz commented 4 years ago

The genomeSize is used only for estimating coverage. But canu, by default, is only correcting and assembling the longest 40x of the input data. For environmental samples this isn't the correct strategy. Instead, you will want to set options corOutCoverage=10000 corMhapSensitivity=high corMinCoverage=0 to use all the reads and be more aggressive at finding overlaps. (This is from https://canu.readthedocs.io/en/latest/faq.html#id12)

The -minlength you're seeing there is canu's minReadLength option; the minimum read length to use. There is no minimum enforced contig length; however, contigs that appear to be formed from spurious overlaps between two or a few reads are flagged as 'unassembled'.

zhanwen-cheng commented 4 years ago

Hi brian, thanks for you replying!