rcs333 / VAPiD

VAPiD: Viral Annotation and Identification Pipeline
MIT License
50 stars 15 forks source link

running VAPiD throws IndexError: list index out of range - v2 #18

Open ammaraziz opened 10 months ago

ammaraziz commented 10 months ago

Hi,

Thanks for creating vapid. I'm having a similar problem to issue #14 . However it's on a different line of code:

Searching local blast database at ref_seq_vir
Traceback (most recent call last):
  File "../VAPiD/vapid3.py", line 969, in <module>
    strain2species[virus_strain_list[x]] = annotate_a_virus(virus_strain_list[x], virus_genome_list[x],
  File ".../VAPiD/vapid3.py", line 635, in annotate_a_virus
    name_of_virus, our_seq, ref_seq, ref_accession, need_to_rc = blast_n_stuff(strain, strain + SLASH + strain + '.fasta')
  File "../VAPiD/vapid3.py", line 182, in blast_n_stuff
    ref_seq_gb = line.split('|')[3]

I am using the default ref_seq_vir database. Command I used:

python vapid3.py my.fasta example.sbt --metadata_loc meta.csv

I have confirmed all deps are installed (blast, mafft) and running/in path. What's interesting is this line:

for line in open(strain + SLASH + strain + '.blastresults'):
    ref_seq_gb = line.split('|')[3]

This is the output .blastresults content:

yfv NC_002031.1 99.880  10861   13  0   2   10862   2   10862   0.0 19994

Reading the code, the blast output is in -outfmt 6 which is tab separated. The code above parses the blast output with | as a separator but the file is actually tab delimited. Another odd thing is that the code extracts the fourth element. I have to confess confusion here, why is the above code working for others?

Anyway I changed the offending line(s) to this:

ref_seq_gb = line.split('\t')[1]

there are two instances where the blast file is parsed.

I am running on MacOS ARM based, the OS is detected correctly as Darwin. Python version 3.9. All deps installed via conda (including blast).