phac-nml / sistr_cmd

SISTR (Salmonella In Silico Typing Resource) command-line tool
Apache License 2.0
25 stars 9 forks source link

Discrepant results between similar Salamae genomes #57

Open flashton2003 opened 1 year ago

flashton2003 commented 1 year ago

Hello,

We've isolated some subsp salamaes for one of our projects. I have a few questions about the SISTR output for these isolates:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

sample | cgmlst_found_loci | cgmlst_matching_alleles | cgmlst_subspecies | o_antigen | serogroup | serovar | serovar_antigen | serovar_cgmlst | O antigen prediction | H1 antigen prediction(fliC) | H2 antigen prediction(fljB) | Predicted identification | Predicted antigenic profile | Predicted serotype | average_depth | snp_count | indel_count | N_count | reads_cov | Reference | Organism from Esmie | Salmonella genus -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- CQJ13L | 330 | 129 | salamae | - | B | II 1,4,12,[27]:a:z39\|II 4:a:z39 | II 1,4,12,[27]:a:z39\|II 4:a:z39 | II 1,4,[5],12,[27]:b:[e,n,x] | 4 | a | z39 | Salmonella enterica subspecies salamae (subspecies II) | 4:a:z39 | II [1],4,12,[27]:a:z39 | 70.3072 | 40928 | 0 | 0 | 89.86 | GCF_019339485.1 |   | Salmonella species CQJ127 | 330 | 134 | salamae | - | B | II B:-:e,n,x | II B:-:e,n,x | II 1,4,[5],12,[27]:b:[e,n,x] | 4 | z | e,n,x | Salmonella enterica subspecies salamae (subspecies II) | 4:z:e,n,x | II [1],4,12,27:z:e,n,x | 62.4964 | 40202 | 0 | 0 | 89.21 | GCF_019339485.1 | Salmonella Typhimurium | Salmonella typhimurium

  1. Neither of these samples have a prediction in the o_antigen column, but when I blast them against your database, they have quite a good match (97% similarity, >99% coverage) to "304|584|1,4,12,27|B" from the wzx database. Is this match not good enough to call the O antigen? Or is there uncertainty about record 304?
  2. They are quite similar results across most fields (and for my blast results against the wxy and wxz databases), but they have different results in the output. How come?

Here are the fasta files, in case you want to dig in.

https://www.dropbox.com/scl/fi/ppkiflqyvtwrn28nah9s6/CQJ127_S25_L001.fna?rlkey=t19wktth1uguutnmqkuej3jq7&dl=0 https://www.dropbox.com/scl/fi/k4yg4iwo1k0mjdinyahx0/CQJ13L_S23_L001.fna?rlkey=woawgupve273z3b7y7tx2xipi&dl=0

Thanks,

Phil

kbessonov1984 commented 2 months ago

Hello, These are complex isolates to type as serovars are not summarized by single name but rather an antigenic profile. SISTR uses antigens, cgMLST and MASH (if selected) to provide a final serovar call with antigen results taking precedence overall all other evidences. The O antigen values summarized by o_antigen field is deduced from the serovar by reverse WHO known serovars table lookup sistr/data/Salmonella-serotype_serogroup_antigen_table-WHO_2007.csv. The most informative is the json output format specified via -f json option that provides all intermediate and reliability values. For both samples I would use cgMLST serovar as a final serovar.

Serovar prediction logic

Both samples belong to subspecies salamae but serovars are different. We provide all information from all evidences so the end users can finalize the serovar prediction. We are currently working on the version 1.1.3 release update that will be released soon and provide more transparent serovar prediction logic messages in the log.

SISTR v1.1.2 results

SeqSero2 results for comparison

Input files: CQJ127_S25_L001.fna O antigen prediction: 4 H1 antigen prediction(fliC): 1,2,7 H2 antigen prediction(fljB): e,n,x Predicted identification: Salmonella enterica subspecies salamae (subspecies II) Predicted antigenic profile: 4:1,2,7:e,n,x Predicted serotype: II 4:1,2,7:e,n,x Note: This predicted serotype is not in the Kauffman-White scheme.

Input files: CQJ13L_S23_L001.fna O antigen prediction: 4 H1 antigen prediction(fliC): a H2 antigen prediction(fljB): z39 Predicted identification: Salmonella enterica subspecies salamae (subspecies II) Predicted antigenic profile: 4:a:z39 Predicted serotype: II [1],4,12,[27]:a:z39 Note:

WKLM scheme