Open sheikki opened 2 years ago
Hello,
I was able to reproduce your error for GCA_007463835.1 Salmonella enterica subsp. salamae that is caused by not WHO listed antigenic formula II 1,9,12,46,27:z29:1,5
for cgMLST prediction mapping to serovar. This causes the results dataframe to be empty, as there are no compatible WHO serovar to map this antigenic formula to. The exact location of the crash in the source code occurs in antigen_predictor.lookup_serovar_antigens(serovar_table(),cgmlst_serovar)
function:
Traceback (most recent call last):
File "/sistr/sistr_cmd.py", line 275, in sistr_predict
overall_serovar_call(prediction, serovar_predictor)
File "/sistr/src/serovar_prediction/__init__.py", line 542, in overall_serovar_call
cgmlst_serovar_antigens = antigen_predictor.lookup_serovar_antigens(serovar_table(),cgmlst_serovar)
File "/sistr/src/serovar_prediction/__init__.py", line 390, in lookup_serovar_antigens
spp = df_prediction['subspecies'].values.item(0)
IndexError: index 0 is out of bounds for axis 0 with size 0
Thank you for reporting this legitimate issue. Will definitely need to address this edge case in code in the next release.
Hi,
Here are two more assemblies which fail similarly:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/017/826/655/GCA_017826655.1_PDT001000035.1_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/017/845/815/GCA_017845815.1_PDT001001406.1_genomic.fna.gz
How to reproduce:
As far as I can tell, there is nothing weird about this putative Salmonella genome assembly..