phac-nml / sistr_cmd

SISTR (Salmonella In Silico Typing Resource) command-line tool
Apache License 2.0
25 stars 9 forks source link

problem in wzx.fasta #30

Closed alantsangmb closed 5 years ago

alantsangmb commented 5 years ago

Hi, @peterk87

In the wzx.fasta file, I found that the ref seq 407 and 408 were named as "407|584|1,3,19|E1" and "408|584|1,3,19|E1". I am not sure whether it is a transcription error. Is it actually referring to E4 instead?

peterk87 commented 5 years ago

Hi @alantsangmb,

I don't recall what happened with those antigen sequences, but I think I may have given up on trying to assign them O-antigens based on the serovar of the genome these sequences were derived from.

For serogroups E1 and E4, the sequences for different alleles of wzx or wzy tend to be >99% similar so confidently assigning a genome to either E1 or E4 is difficult strictly based on the wzx/wzy gene results.

We're planning on updating the antigen gene databases and adding more genomic data into the next version of SISTR, so this curation issue should be fixed in the update, and hopefully, in the meantime, this shouldn't cause any issues with serovar predictions.

alantsangmb commented 5 years ago

I see. Thank you.