Closed seajane closed 5 months ago
I upgraded to 2.0.4, just in case this helped and received the same error. Here it is in more detail:
Traceback (most recent call last):
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-new2/lib/python3.10/site-packages/ppanggolin/annotate/annotate.py", line 578, in get_gene_sequences_from_fastas
gene.add_sequence(get_dna_sequence(fasta_dict[org][contig.name], gene))
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-new2/lib/python3.10/site-packages/ppanggolin/annotate/synta.py", line 306, in get_dna_sequence
return reverse_complement(contig_seq[gene.start - 1:gene.stop])
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-new2/lib/python3.10/site-packages/ppanggolin/annotate/synta.py", line 46, in reverse_complement
rcseq += complement[i]
KeyError: 'L'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-new2/bin/ppanggolin", line 8, in <module>
sys.exit(main())
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-new2/lib/python3.10/site-packages/ppanggolin/main.py", line 177, in main
ppanggolin.annotate.launch(args)
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-new2/lib/python3.10/site-packages/ppanggolin/annotate/annotate.py", line 670, in launch
get_gene_sequences_from_fastas(pangenome, args.fasta)
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-new2/lib/python3.10/site-packages/ppanggolin/annotate/annotate.py", line 586, in get_gene_sequences_from_fastas
raise KeyError(msg)
KeyError: 'Fasta file for genome G_NR021 did not have the contig NZ_NQOP01000002.1_1 that was read from the annotation file. The provided contigs in the fasta were : NZ_NQOP01000003.1_1, NZ_NQOP01000003.1_2, NZ_NQOP01000003.1_3, NZ_NQOP01000003.1_4, NZ_NQOP01000003.1_5, NZ_NQOP01000003.1_6, NZ_NQOP01000003.1_7, NZ_NQOP01000003.1_8, NZ_NQOP01000003.1_9, NZ_NQOP01000003.1_10, NZ_NQOP01000003.1_11, NZ_NQOP01000003.1_12, NZ_NQOP01000003.1_13, NZ_NQOP01000003.1_14, NZ_NQOP01000003.1_15, NZ_NQOP01000003.1_16, NZ_NQOP01000003.1_17, NZ_NQOP01000003.1_18, NZ_NQOP01000003.1_19, NZ_NQOP01000003.1_20, NZ_NQOP01000003.1_21, NZ_NQOP01000003.1_22, NZ_NQOP01000003.1_23, NZ_NQOP01000003.1_24, NZ_NQOP01000003.1_25, NZ_NQOP01000003.1_26, NZ_NQOP01000003.1_27, NZ_NQOP01000003.1_28, NZ_NQOP01000003.1_29, NZ_NQOP01000003.1_30, NZ_NQOP01000003.1_31, NZ_NQOP01000003.1_32, NZ_NQOP01000003.1_33, NZ_NQOP01000003.1_34, NZ_NQOP01000003.1_35, NZ_NQOP01000003.1_36, NZ_NQOP01000003.1_37, NZ_NQOP01000003.1_38, NZ_NQOP01000003.1_39, NZ_NQOP01000003.1_40, NZ_NQOP01000003.1_41, NZ_NQOP01000003.1_42, NZ_NQOP01000003.1_43, NZ_NQOP01000003.1_44, NZ_NQOP01000003.1_45, NZ_NQOP01000003.1_46, NZ_NQOP01000003.1_47, NZ_NQOP01000003.1_48, NZ_NQOP01000003.1_49, NZ_NQOP01000003.1_50, NZ_NQOP01000003.1_51, NZ_NQOP01000003.1_52, NZ_NQOP01000003.1_53, NZ_NQOP01000003.1_54, NZ_NQOP01000003.1_55, NZ_NQOP01000003.1_56, NZ_NQOP01000003.1_57, NZ_NQOP01000003.1_58, NZ_NQOP01000003.1_59, NZ_NQOP01000003.1_60, NZ_NQOP01000003.1_61, NZ_NQOP01000003.1_62, NZ_NQOP01000003.1_63, NZ_NQOP01000003.1_64, NZ_NQOP01000003.1_65, NZ_NQOP01000003.1_66, NZ_NQOP01000003.1_67, NZ_NQOP01000003.1_68, NZ_NQOP01000003.1_69, NZ_NQOP01000003.1_70, NZ_NQOP01000003.1_71, NZ_NQOP01000003.1_72, NZ_NQOP01000003.1_73, NZ_NQOP01000003.1_74, NZ_NQOP01000003.1_75, NZ_NQOP01000003.1_76, NZ_NQOP01000003.1_77, NZ_NQOP01000003.1_78, NZ_NQOP01000003.1_79, NZ_NQOP01000003.1_80, NZ_NQOP01000003.1_81, NZ_NQOP01000003.1_82, NZ_NQOP01000003.1_83, NZ_NQOP01000003.1_84, NZ_NQOP01000002.1_1
Now that I see the whole error, I believe one problem lies in fasta type. PPanGGOLiN is expecting DNA files (I assume, from the 'get_dna_Sequence' and 'reverse complement' function that failed.) Is there anyway to use AA sequences?
I converted all my AA sequences using both degenerate bases and a random trinucleotide code for each amino acid. The final error above is the same on the bottom, the initial error has switched to
Traceback (most recent call last):
File "/Users/hbouzek/opt/anaconda3/envs/ppgg-new2/lib/python3.10/site-packages/ppanggolin/annotate/annotate.py", line 578, in get_gene_sequences_from_fastas
gene.add_sequence(get_dna_sequence(fasta_dict[org][contig.name], gene))
KeyError: 'NZ_NQOP01000002.1_1'
Hi,
So I'm not quite sure what is happenning with the code, but ppanggolin deals with genomic dna fasta files and nothing else. It does not expect to work with AA or CDS fasta files.
With what you want to do, if you really want to use PPanGGOLiN I'd recommend to use the path of providing your own clustering, using this option: https://ppanggolin.readthedocs.io/en/latest/user/PangenomeAnalyses/pangenomeAnalyses.html#providing-your-gene-families
Along with the gff3 files that you are already using. If you are looking for a tool for the clustering, PPanGGOLiN uses MMseqs2 internally, so you can probably have a go with that.
I agree that in your case where gff3 files are provided things could work with AA or CDS fasta sequences in theory, but we have not gone this path.
Adelme
I am trying to create a PPanGGOLiN pangenome using annotations from another source. I have gff3 files and matched fasta. I am running version ppanggolin 2.0.2. I used this command:
ppanggolin annotate --anno gffdf.list --fasta eggfast.list
. I received this error