Closed YuanwenGuo closed 5 years ago
First off, 15x is not really enough to assemble and is part of why it's taking so long. The parameters for overlaps are automatically turned up slowing down the computation.
There are some suggestions for repetitive genomes on the FAQ: https://canu.readthedocs.io/en/latest/faq.html#my-assembly-is-running-out-of-space-is-too-slow but I expect this will degrade the assembly at your lower coverage. You could try the parameters suggested there but leave out --threshold 0.80 --num-hashes 512 --num-min-matches 3
. We typically recommend at least 20x coverage.
Thank you for the prompt reply! I will try run canu with the parameters you suggested.
Right now we only have 15X Nanopore long reads data for the assembly. Besides, we also have about 50X illumina short reads data. We plan to use Pilon for further polish after Canu assembly as suggested by Canu quick start tutorial. I wonder if this combination strategy will make a reasonable assembly?
Best, Yuanwen
The issue with 15x isn't so much the base quality but that you might not have enough coverage to assemble the full genome. In that case pilon isn't going to help. If you're not planning to get more than 20x coverage, I'd suggest trying a hybrid assembler instead.
Thank you for the suggestion! We tried to use some hybrid assembler like Masurca, but it's taking too long to finish the assembly. I probably will try the parameters you suggested, and see how it works.
Much appreciated! Yuanwen
Closing since initial issue explained by the very low coverage, feel free to post updates on how the assembly turns out if you get one.
Thank you for developing such a great tool to facilitate genome assembly!
I am trying to use all available nodes on our university slurm clusters, but the cormhap step took more than seven days and still didn't finish. I checked the mhap.*.out files, and seems like only 164 batch jobs are finished (total number of batch jobs is 597). I am wondering if there is any way to accelerate the assembly?
Our genome is a highly repetitive plant species, and its estimated size is ~1G. We have about 15X Nanopore coverage data.
I am using canu-1.8, and my command is:
!/bin/bash
/canu-1.8/Linux-amd64/bin/canu gridOptions="--time=80:00:00 --partition=killable.q" gridOptionsJobName=canu -p canu_Nano -d ($) genomeSize=1g correctedErrorRate=0.154 -nanopore-raw ($.fastq)
I will appreciate any suggestions or comments about this issue!
Best, Yuanwen