rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

MUSCLE failed to complete pieces; Error running MUSCLE on genome cluster during msa step #41

Closed menickname closed 1 year ago

menickname commented 2 years ago

Dear Trycycler Team

I have been assembling multiple bacterial genomes with the Trycycler pipeline, but encountered following error with one of the genome clusters I obtained. All other datasets went fine, except for this one giving an error during the msa part of the pipeline. While all sequences within the genome cluster were properly ran through the reconcile pipeline, the msa part does not seem to complete to the end due to 1 piece of sequence that failed during the MUSCLE alignment. I have tried to remove the most "divergent" sequences, but this did not solve the issue. I also tried to play with the piece sizes, but without any success. Any thoughts on how to get this issue solved?

`Starting Trycycler MSA (2022-07-08 10:45:00) Trycycler MSA is a tool for conducting global multiple sequence alignment of contig sequences.

Input sequences: C_utg000001c: 4,078,071 bp G_utg000003c: 4,074,445 bp J_Utg2938: 4,069,380 bp

Checking required software: MUSCLE: v3.8.1551

Partitioning sequences (2022-07-08 10:45:00) The sequences are now partitioned into smaller chunks to make the multiple sequence alignment more tractable.

pieces: 4010

median piece size: 1,000 bp max piece size: 44,000 bp

Running Muscle (2022-07-08 10:45:14) Trycycler now runs Muscle on each of the pieces to turn them into multiple sequence alignments.

pieces: 4010

Error: MUSCLE failed to complete on 1 of the 4010 pieces. Please remove the most divergent sequences from this cluster and then try again.`

Thank you in advance. Best regards Nick Vereecke

Ceriz7 commented 2 years ago

Hi Nick,

Got the same error (1 piece failure) here for a 3.2 MB genome...I just wondered have you solved it?

Cheers, Yuwei

rrwick commented 2 years ago

I'm not exactly sure what the problem is, but that max piece size (44 kbp) is unusually large. See this wiki page for an explanation for how Trycycler breaks the sequence into pieces. Possible causes for such a large piece:

Just a couple guesses at a solution:

Let me know if either of those work!

Ryan

rrwick commented 2 years ago

Another suggestion comes from #44: using MAFFT instead of trycycler msa to make your alignment. It might be slow, but as long as the result is a FASTA-format global MSA named 3_msa.fasta, it should work in the following steps.