klebgenomics / Kleborate

GNU General Public License v3.0
117 stars 49 forks source link

error occurring for some FASTA files #62

Open karubiotools opened 2 years ago

karubiotools commented 2 years ago

Dear Developers, I would like to know why the following error occur when I launch Kleborate with the '--all' option for some FASTA files, please: " strain species ST virulence_score resistance_score Yersiniabactin YbST Colibactin CbST Aerobactin AbST Salmochelin SmST RmpADC RmST rmpA2 wzi K_locus K_locus_confidence O_locus O_locus_confidence AGly_acquired Col_acquired Fcyn_acquired Flq_acquired Gly_acquired MLS_acquired Phe_acquired Rif_acquired Sul_acquired Tet_acquired Tgc_acquired Tmt_acquired Bla_acquired Bla_inhR_acquired Bla_ESBL_acquired Bla_ESBL_inhR_acquired Bla_Carb_acquired Bla_chr SHV_mutations Omp_mutations Col_mutations Flq_mutations truncated_resistance_hits spurious_resistance_hits Traceback (most recent call last): File "/soft/miniconda3/bin/kleborate", line 33, in sys.exit(load_entry_point('Kleborate==2.2.0', 'console_scripts', 'kleborate')()) File "/soft/miniconda3/lib/python3.9/site-packages/kleborate/main.py", line 64, in main results.update(get_resistance_results(data_folder, contigs, args, res_headers, File "/soft/miniconda3/lib/python3.9/site-packages/kleborate/main.py", line 570, in get_resistance_results res_hits = resblast_one_assembly(contigs, gene_info, qrdr, trunc, omp, seqs, File "/soft/miniconda3/lib/python3.9/site-packages/kleborate/resBLAST.py", line 32, in resblast_one_assembly hits_dict = blast_against_all(seqs, min_cov, min_ident, contigs, gene_info, File "/soft/miniconda3/lib/python3.9/site-packages/kleborate/resBLAST.py", line 125, in blast_against_all hit_allele, hit_class, hit_bla_class = gene_info[hit.gene_id] KeyError: '403__TetX_Tettet(X6)2434' " Thank you in advance for your help. Best regards, David

nquynh8991 commented 2 years ago

That's because your "gene_info" did not match with "gene_id" in your *.csv file. Take a look back to "gene_id" in your database file, I think you need to fix that ID a little bit before running it again. Hope its help.

learithe commented 1 year ago

This exact error just occurred for me on one sequence out of a set of >800. Thanks to @nquynh8991's comment I tracked it down to a typo for two sequences in the CARD database that comes with the latest version of kleborate here.

CARD_v3.0.8.fasta contains the headers:

402TetX_Tettet(X5)2433 403__TetX_Tettet(X6)__2434

CARD_AMR_clustered.csv contains the entries:

402,tet(X5),Tgc,TetX,tet(X5),2433,ARO_3005057,-,-,no,no,NA,NA 403,tet(X6),Tgc,TetX,tet(X6),2434,ARO_3005056,-,-,no,no,NA,NA

The difference is the specified antibiotic (Tet vs Tgc). I believe these should be Tgc in the fasta file headers, consistent with the csv file (these variants of TetX are associated with tigecycline resistance)

I solved this by editing the CARD fasta file and recreating the blast database from it. It would be good to solve this typo for a future Kleborate or database release!