assembly overlapped amplicon data, where to find contained reads

marbl / canu

A single molecule sequence assembler for genomes large and small.

649 stars 178 forks source link

By definition contained reads wouldn't be on their own since they can't form contigs. They'd be assigned to be part of a contig with the read that contains them. That may be in the unassembled fasta (with contigs with >1 read) or in the contig sequences. The canu outputs will give coordinates for all reads included in the assembly (both contig and unassembled) that you can use to find where each read ended up (see https://canu.readthedocs.io/en/latest/tutorial.html#outputs).

I don't think you want the longest read, I suspect the longest reads may just be artifacts or off-target sequences. You likely want best supported (highest coverage) contig or something like it. See also #2235 and #2269 for some possible parameter tweaks for amplicon assembly.

marbl / canu

assembly overlapped amplicon data, where to find contained reads #2327