rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

should I worry about the persistent indels? #34

Closed fengyuchengdu closed 2 years ago

fengyuchengdu commented 2 years ago

after many rounds (at least 3 rounds each) of medaka, polypolish and polca polishing, I still got 14 indel errors, informed by the report from polca (shown below),

"Substitution Errors: 0 Insertion/Deletion Errors: 14 Assembly Size: 5523242 Consensus Quality: 99.9997 Consensus QV: 55.96"

should I find other polishing tools to fix these indels, given that I don't want to add errors to the assembly? many thanks.

rrwick commented 2 years ago

Literally fixing all errors in a bacterial genome assembly isn't easy, mainly due to repetitive regions of the genome. As you might have already seen, I discuss this a lot in the Polypolish paper.

Briefly, I think that your Medaka+Polypolish+POLCA genome is probably good, and you can stop polishing there. The fact that POLCA made a few changes doesn't surprise me - it can sometimes fix things that Polypolish cannot. But if you really want to get your genome as perfect as possible, I'd recommend further rounds of polishing with various tools, then manually inspecting each change and the read alignments over those changes. It's a laborious process!

Ryan