rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
304 stars 28 forks source link

How to remove empty regions in the final Trycycler-assembled bin using polishing software? #54

Closed wyanren closed 1 year ago

wyanren commented 1 year ago

Hi, I really appreciate this pipeline and software, it performs a much better result than using Unicycler or sorely Flye.

I have an enriched microbial sample that was sequenced using both Illumina and Nanopore sequencing. And I used Trycycler to assemble the genome and bin it into one single contig.

But when I mapped it to the Illumina data, I found that some regions in this bin were missing. These regions should not be empty, so I want to remove them. I wanna know what software can I use to remove these regions, and does it make sense?

Looking forward to your reply. Sincerely, yanren.

rrwick commented 1 year ago

Hi Yanren,

If I understand correctly, there are parts of your long-read-assembled genome which are not present in your short-read data? Cases like this (when short-read and long-read sets aren't in agreement) can be tough, which is why it's best to sequence both short and long reads from the same DNA, when possible. But I know that isn't always possible.

You could try a Unicycler assembly of this genome and compare it to your Trycycler assembly. Unicycler starts with a short-read assembly graph, so it tends to prefer the short-read version of the genome when there are differences. You could also try aligning the short reads to your Trycycler assembly and then inspecting in IGV to find the region where depth drops to zero - possibly with clipped alignments right at the boundary. Either way, it will probably take some manual work (and maybe some educated guesses) to sort this out. Good luck :smile:

Ryan