I am using Trycycler v0.5.3 and have been running into trouble on the msa step on a few of my assemblies. Using the default msa settings and regardless of whether I use muscle v3 or v5, I will consistently get an error that muscle couldn't finish on one or more segments. Since the temporary files are automatically deleted, I can't troubleshoot this any further to determine where muscle is having problems. I've tried removing all contigs in the cluster that required addition or trimming for circularization and also those with more than a few 100 bp of indels and still couldn't get it to finish. I see from prior Issues that others have had this same or similar problem on this step, but it's not clear if a solution was available.
So I tried taking another approach and used mafft (v7.475) to directly align the 2_all_seqs.fasta file without any partitioning. This only took 3 hours to align the seven 6.9 Mbp contigs in the file using 12 threads. The consensus step seems to have run just fine on the mafft-produced 3_msa.fasta file:
chunks: 7,601 (3,801 same, 3,800 different)
combining small chunks: 6,549 (3,275 same, 3,274 different)
...
Consensus length: 6,924,934 bp
Different chunks needing assessment: 5
Different chunks not needing assessment: 3,269
...
Chunks where sequence is...
the same as in the initial consensus: 2
different to the initial consensus: 3
So with the caveat that you probably have not extensively (or maybe ever) tested trycycler using mafft without sequence partitioning and muscle for the multiple sequence alignment step, can you think of any reason why this approach wouldn't be acceptable when the trycycler msa fails for unclear reasons?
I am using Trycycler v0.5.3 and have been running into trouble on the
msa
step on a few of my assemblies. Using the defaultmsa
settings and regardless of whether I use muscle v3 or v5, I will consistently get an error that muscle couldn't finish on one or more segments. Since the temporary files are automatically deleted, I can't troubleshoot this any further to determine where muscle is having problems. I've tried removing all contigs in the cluster that required addition or trimming for circularization and also those with more than a few 100 bp of indels and still couldn't get it to finish. I see from prior Issues that others have had this same or similar problem on this step, but it's not clear if a solution was available.So I tried taking another approach and used mafft (v7.475) to directly align the 2_all_seqs.fasta file without any partitioning. This only took 3 hours to align the seven 6.9 Mbp contigs in the file using 12 threads. The
consensus
step seems to have run just fine on the mafft-produced 3_msa.fasta file:So with the caveat that you probably have not extensively (or maybe ever) tested trycycler using mafft without sequence partitioning and muscle for the multiple sequence alignment step, can you think of any reason why this approach wouldn't be acceptable when the
trycycler msa
fails for unclear reasons?Thanks.