phac-nml / sistr_cmd

SISTR (Salmonella In Silico Typing Resource) command-line tool
Apache License 2.0
25 stars 9 forks source link

IndexError: index 0 is out of bounds for axis 0 with size 0 #53

Open sheikki opened 2 years ago

sheikki commented 2 years ago

How to reproduce:

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/007/463/835/GCA_007463835.1_PDT000542330.1/GCA_007463835.1_PDT000542330.1_ge
nomic.fna.gz
gunzip GCA_007463835.1_PDT000542330.1_genomic.fna.gz
sistr -m --qc -f tab -t 2 -i GCA_007463835.1_PDT000542330.1_genomic.fna GCA_007463835.1_PDT000542330.1 -o GCA_007463835.1_PDT000542330.1
Traceback (most recent call last):
  File "/usr/local/bin/sistr", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/sistr/sistr_cmd.py", line 410, in main
    outputs = [x.get() for x in res]
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 572, in get
    raise self._value
IndexError: index 0 is out of bounds for axis 0 with size 0

As far as I can tell, there is nothing weird about this putative Salmonella genome assembly..

kbessonov1984 commented 2 years ago

Hello, I was able to reproduce your error for GCA_007463835.1 Salmonella enterica subsp. salamae that is caused by not WHO listed antigenic formula II 1,9,12,46,27:z29:1,5 for cgMLST prediction mapping to serovar. This causes the results dataframe to be empty, as there are no compatible WHO serovar to map this antigenic formula to. The exact location of the crash in the source code occurs in antigen_predictor.lookup_serovar_antigens(serovar_table(),cgmlst_serovar) function:

Traceback (most recent call last):
  File "/sistr/sistr_cmd.py", line 275, in sistr_predict
    overall_serovar_call(prediction, serovar_predictor)
  File "/sistr/src/serovar_prediction/__init__.py", line 542, in overall_serovar_call
    cgmlst_serovar_antigens = antigen_predictor.lookup_serovar_antigens(serovar_table(),cgmlst_serovar)
  File "/sistr/src/serovar_prediction/__init__.py", line 390, in lookup_serovar_antigens
    spp = df_prediction['subspecies'].values.item(0)
IndexError: index 0 is out of bounds for axis 0 with size 0

Thank you for reporting this legitimate issue. Will definitely need to address this edge case in code in the next release.

sheikki commented 2 years ago

Hi,

Here are two more assemblies which fail similarly:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/017/826/655/GCA_017826655.1_PDT001000035.1_genomic.fna.gz
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/017/845/815/GCA_017845815.1_PDT001001406.1_genomic.fna.gz