transcript / samsa2

SAMSA pipeline, version 2.0. An open-source metatranscriptomics pipeline for analyzing microbiome data, built around DIAMOND and customizable reference databases.
GNU General Public License v3.0
53 stars 36 forks source link

Error in DIAMOND_analysis_counter.py #57

Open mweberr opened 3 years ago

mweberr commented 3 years ago

Hi, I have compiled a DIAMOND database from the current RefSeq database, but apparently the script DIAMOND_analysis_counter.py get stuck at one line.

Do you have any idea if I need to do any preprocessing of the database before starting DIAMOND_analysis_counter.py

Traceback (most recent call last):
  File "samsa2/python_scripts/DIAMOND_analysis_counter.py", line 151, in <module>
    if split_db_org[1] == "sp.":
IndexError: list index out of range

line 162, in <module>
    db_org = split_db_org[1] + " " + split_db_org[2]
IndexError: list index out of range

Best, Michael

transcript commented 3 years ago

Hey Michael,

Could you share the command you're running to call DIAMOND_analysis_counter.py? What are you specifying as inputs?

My guess is that something's funky with the database file you're supplying, and seeing the command may help a bit.

Best, Sam

mweberr commented 3 years ago

Hi Sam, I started to debug the run of DIAMOND_analysis_counter.py and apparently it exits with error in the following line

>ADN03191.1 VP4, partial [Rotavirus pig/2B/IRL/2005/P[13]/[22]]

The split to extract the db_org variable needs probably to be extended. I will first check if there are other lines causing similar problems.

transcript commented 3 years ago

Ah, yes, the parsing script doesn't do well when there are multiple instances of square brackets in the line. I've noticed that the majority of brackets are used in the function, rather than the organism name, so this section (lines 146-162) are parsing out the organism name by assuming that this is what's in the last set of brackets.

The issue is actually with line 147, where it's selecting 22] as the organism name, as this is what's inside the last set of brackets.

You could try running a command on your database to replace this line with one that uses parentheses instead of brackets, if this is the only database entry where you hit this error - otherwise, this may take some regex work that will be a bit tougher for me to work out. Did you find other lines causing issues?