rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

'cannot determine trim amount' #59

Closed jdakota1305 closed 1 year ago

jdakota1305 commented 1 year ago

Hi, first of all I wanted to say thanks for developing Trycycler!

I am currently trying to run various assemblies (Flye, Microbial Assembler from Pacbio, Canu, and Raven) from a recent PacBio Sequel II sequencing project for Prevotella species, and am having an issue come up quite consistently where after clustering the contigs, the reconcile step fails due to length differences and or this error that states "cannot determine trim amount" for a contig.

In this case, I am wondering if there is a general approach you would recommend for these contigs? This usually happens with larger contigs that represent what is expected as representation of the majority of the genome (2.6mb contig for an expected 2.9mb genome for example).

Thank you! Happy to provide additional details or files.

rrwick commented 1 year ago

If it's just a few contigs which fail during the reconcile step, I just throw them out and continue with the remaining contigs. For example, if I have 12 contigs in a cluster and 3 are not working well, I proceed with the other 9.

However, if a lot of your contigs are problematic, that probably indicates that your initial assemblies didn't go well. This is usually due to one of the following:

  1. The reads are too short or shallow, leading to fragmented assemblies.
  2. There is heterogeneity in the sample, confusing the assemblers and leading to inconsistent results.

For problem 1, the best solution is to sequence more, aiming for longer reads. Problem 2 can be tougher to solve - you often need to wrap your head around the nature of the heterogeneity, then cull reads from the minority variants. This can be a lot of work, and I don't have an easy approach 😦