rrwick / Trycycler

A tool for generating consensus long-read assemblies for bacterial genomes
GNU General Public License v3.0
306 stars 28 forks source link

msa Error #46

Closed rowi2024 closed 1 year ago

rowi2024 commented 1 year ago

Hello,

I'm failing to complete the msa step in trycycler. I've copied the message below. I'm wondering if the issue might be with large number of repeats (~20; ~70Kb total length) in my engineered genome. To provide some wiggle room for repeat issues, I allowed the max_indel size to be 2000 (Indel sizes are listed at bottom). Thanks for any advice!

****Error message:***** Merging MSA (2022-11-08 18:52:31) Each of the MSA pieces are now merged together and saved to file.

Traceback (most recent call last): File "/homes/rwilton/software/miniconda3/envs/nano-trycycler/bin/trycycler", line 10, in sys.exit(main()) File "/homes/rwilton/software/miniconda3/envs/nano-trycycler/lib/python3.10/site-packages/trycycler/main.py", line 51, in main msa(args) File "/homes/rwilton/software/miniconda3/envs/nano-trycycler/lib/python3.10/site-packages/trycycler/msa.py", line 36, in msa merge_pieces(temp_dir, args.cluster_dir, seqs) File "/homes/rwilton/software/miniconda3/envs/nano-trycycler/lib/python3.10/site-packages/trycycler/msa.py", line 175, in merge_pieces aligned_seq_parts[n].append(parts[n].upper()) KeyError: 'A_contig_1'

****Indels from reconcile step** A_contig_1 vs B_contig_1... 99.00% identity, max indel = 43 A_contig_1 vs C_contig_1... 99.93% identity, max indel = 27 A_contig_1 vs D_contig_1... 99.78% identity, max indel = 37 A_contig_1 vs E_Utg648... 99.88% identity, max indel = 604 A_contig_1 vs G_Utg580... 99.72% identity, max indel = 301 A_contig_1 vs H_Utg592... 99.93% identity, max indel = 781 A_contig_1 vs L_utg000001l... 99.18% identity, max indel = 39 B_contig_1 vs C_contig_1... 99.07% identity, max indel = 1452 B_contig_1 vs D_contig_1... 99.21% identity, max indel = 1669 B_contig_1 vs E_Utg648... 98.89% identity, max indel = 1533 B_contig_1 vs G_Utg580... 99.17% identity, max indel = 1521 B_contig_1 vs H_Utg592... 98.97% identity, max indel = 1704 B_contig_1 vs L_utg000001l... 99.77% identity, max indel = 94 C_contig_1 vs D_contig_1... 99.86% identity, max indel = 37 C_contig_1 vs E_Utg648... 99.81% identity, max indel = 1401 C_contig_1 vs G_Utg580... 99.79% identity, max indel = 301 C_contig_1 vs H_Utg592... 99.88% identity, max indel = 781 C_contig_1 vs L_utg000001l... 99.25% identity, max indel = 40 D_contig_1 vs E_Utg648... 99.67% identity, max indel = 1156 D_contig_1 vs G_Utg580... 99.93% identity, max indel = 301 D_contig_1 vs H_Utg592... 99.75% identity, max indel = 1499 D_contig_1 vs L_utg000001l... 99.39% identity, max indel = 38 E_Utg648 vs G_Utg580... 99.64% identity, max indel = 301 E_Utg648 vs H_Utg592... 99.84% identity, max indel = 781 E_Utg648 vs L_utg000001l... 99.09% identity, max indel = 274 G_Utg580 vs H_Utg592... 99.73% identity, max indel = 1499 G_Utg580 vs L_utg000001l... 99.38% identity, max indel = 301 H_Utg592 vs L_utg000001l... 99.18% identity, max indel = 599

Aluminio-visto commented 1 year ago

Hi, I'm having a similar issue in this step, but in our case we have a wild-type Pseudomonas genome: Merging MSA (2022-12-07 11:33:28) Each of the MSA pieces are now merged together and saved to file.

Traceback (most recent call last): File "/home/usuario/miniconda3/envs/trycycler/bin/trycycler", line 10, in sys.exit(main()) File "/home/usuario/miniconda3/envs/trycycler/lib/python3.10/site-packages/trycycler/main.py", line 51, in main msa(args) File "/home/usuario/miniconda3/envs/trycycler/lib/python3.10/site-packages/trycycler/msa.py", line 36, in msa merge_pieces(temp_dir, args.cluster_dir, seqs) File "/home/usuario/miniconda3/envs/trycycler/lib/python3.10/site-packages/trycycler/msa.py", line 175, in merge_pieces aligned_seq_parts[n].append(parts[n].upper()) KeyError: 'A_contig_2'

Any help will be very much appreciated

uyghbo commented 1 year ago

Dear Trycycler Team

Got the same error here for a 5.4 MB Klebsiella pneumoniae genome

ESC[93mESC[1mESC[4mMerging MSAESC[0m ESC[2m(2022-12-30 15:38:36)ESC[0m ESC[2m Each of the MSA pieces are now merged together and saved to file.ESC[0m

Traceback (most recent call last): File "/home/miniconda3/envs/bactopia/envs/unicycler/bin/trycycler", line 10, in sys.exit(main()) File "/home/miniconda3/envs/unicycler/lib/python3.10/site-packages/trycycler/main.py", line 51, in main msa(args) File "/home/miniconda3/envs/unicycler/lib/python3.10/site-packages/trycycler/msa.py", line 36, in msa merge_pieces(temp_dir, args.cluster_dir, seqs) File "/home/miniconda3/envs/unicycler/lib/python3.10/site-packages/trycycler/msa.py", line 175, in merge_pieces aligned_seq_parts[n].append(parts[n].upper()) KeyError: 'E_ctg000030'

TheIncredibleMulk commented 1 year ago

Dear Trycycler Team,

Got a similar error for a 5MB Acidobacteriota Genome

Traceback (most recent call last):
  File "/home/seq/miniconda3/envs/trycycler/bin/trycycler", line 10, in <module>
    sys.exit(main())
  File "/home/seq/miniconda3/envs/trycycler/lib/python3.10/site-packages/trycycler/main.py", line 51, in main
    msa(args)
  File "/home/seq/miniconda3/envs/trycycler/lib/python3.10/site-packages/trycycler/msa.py", line 36, in msa
    merge_pieces(temp_dir, args.cluster_dir, seqs)
  File "/home/seq/miniconda3/envs/trycycler/lib/python3.10/site-packages/trycycler/msa.py", line 175, in merge_pieces
    aligned_seq_parts[n].append(parts[n].upper())
KeyError: 'A_contig_1'

Thanks in advance for any help or guidance correcting the issue.

TheIncredibleMulk commented 1 year ago

Updating my previous issue. Backdating muscle to version 3.8.1551 fixed the issue and it ran fine. Hopefully that helps others with this issue.

rmormando commented 1 year ago

how did you backdate it?

TheIncredibleMulk commented 1 year ago

Depends on how you installed it. If you used conda like I did then run the following. Activate the trycycler environment. conda activate trycycler Backdate your version of muscle. conda install muscle=3.8.1551

An additional note, depending on the flavor of Linux / environment you're running I've had to use different versions but 3.8.1551 seemed the most consistent, but I also had success in some systems with 3.8.31 .

rmormando commented 1 year ago

That worked!! Thank you!!

rrwick commented 1 year ago

Thanks, all! I've updated Trycycler's docs to make it clear that MUSCLE v3 is strongly recommended. And after the next release of Trycycler (soon), I'll ensure that the conda recipe has muscle<4.

Ryan