Closed keithforest closed 6 years ago
You won't be able to assemble much below about 5-fold coverage. CCS is lower coverage than the long reads typically so you may not have enough reads from the rare species to assemble anything. However, CCS reads should mostly be long enough to span entire bacterial genes so you may just be able to look for genes in the reads directly.
You can run an assembly, specify the reads as -pacbio-corrected and maybe set correctedErrorRate=0.025 (if you believe the 1% error estimate in the data). Any reads that end up in the singletons (asm.unassembled.fasta) you'd have to then annotate directly on the reads.
I have a set of high accuracy ccs reads (>99% predicted accuracy) from a pacbio metagenome experiment. I would like to assemble these reads, with the goal of getting high accuracy protein translations from rare (<1% abundance) species. Could you recommend parameters to optimize for the canu assembly step to achieve this goal? Or maybe canu is the wrong tool for this project?