transcript / samsa2

SAMSA pipeline, version 2.0. An open-source metatranscriptomics pipeline for analyzing microbiome data, built around DIAMOND and customizable reference databases.
GNU General Public License v3.0
54 stars 36 forks source link

Error running DIAMOND_subsystems_analysis_counter.py #65

Open eKariuki-sleepy opened 3 years ago

eKariuki-sleepy commented 3 years ago

Hi, I am trying to run the python script DIAMOND_subsystems_analysis_counter.py and I keep getting this error:

python_scripts/DIAMOND_subsystems_analysis_counter.py", line 133, in <module>
IndexError: list index out of range 

I cannot seem to figure it out, unfortunately. All I understand is that we are indexing the second field and splitting the field using sep "\t", with the exception of line 1 that has NO HIERARCHY. Also, find the head section to my subsys_db.fa database here. Might there be a problem with my database?
Kindly assist.

transcript commented 3 years ago

Hello,

In the screenshot head section you shared of the subsys_db.fa database, it looks like there may be an erroneous newline that is repeated in the entries. If you look at lines 5 and 6, there seems to be a newline (return) between "Tricarballylate_Utilization" and "Carbohydrates".

Can you check to see if those are on the same line, separated by tabs? It should look like:

>fig|1085.1.peg.628     TcuB: works with TcuA to oxidize tricarballylate to cis-aconitate       Tricarballylate_Utilization     Carbohydrates   Organic acids   cmr|NT02RR3166,gb|AAN75034.1,gb|ABC23782.1,gi|25989730,gi|83577231,gi|83594317,gnl|md5|4a7b7dbab4b1cb4f1eb179ee,img|637827101,kegg|rru:Rru_A2987,ref|YP_428069.1,tr|Q2RQ13,tr|Q8GDD2
MFDPCDLPPPPAPAPGASAAEAEARRVLALCTVCGYCTGLCDVFRAAERRPALTSGDLGHLAHLCHGCQACWHACQYTPPHVFAIVVPATLARVRAESYARHAWPRPLKGPAVLALALAATLVVPLLTVLLVPSQDLFAANAAPGAFYGVIPWGVMTPIALLTLGWAALAVGLGVARFWREGAQGPPAAPLARVWGRALADIVSLRNLKGGGRGCFETDDRPSHRRRWLHHALAGGFLLCLGSTLAATVYHHGLGREAPYPLTSLPVLLGLVGGCLMVGGASGLAWLKRHADPEPQAAETLGADRCLLAMLIAVALSGLVLLALRDTAAMGLLLALHLGTVLGFFITLPYGKFVHGAYRAAALLRSAAERRTDPRAPLAERPGVDRDLP

This might be what's occurring. Did you make any changes to the subsys_db, or when did you download it?

eKariuki-sleepy commented 3 years ago

I did not make any changes to the database. I have used a better text viewer (here)and it appears to be in the same format as the one you shared above.

Thank you.

transcript commented 3 years ago

Got it, thanks. I just re-downloaded using the command:

wget "https://zenodo.org/record/5022377/files/subsys_db.fa.bz2" --no-check-certificate
bunzip2 subsys_db.fa.bz2

and I see the proper lines (no extra line breaks). Can you try re-downloading this database with the link here and see if you still get the same error?

eKariuki-sleepy commented 3 years ago

Thank you. Let me do so and give you feedback.

On Wed, Jul 14, 2021 at 9:14 PM Sam Westreich @.***> wrote:

Got it, thanks. I just re-downloaded using the command:

wget "https://zenodo.org/record/5022377/files/subsys_db.fa.bz2" --no-check-certificate bunzip2 subsys_db.fa.bz2

and I see the proper lines (no extra line breaks). Can you try re-downloading this database with the link here and see if you still get the same error?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/transcript/samsa2/issues/65#issuecomment-880107039, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANFWMMSJA6I6CDJSRLR4DCLTXXHZVANCNFSM5ALZIB4A .

eKariuki-sleepy commented 3 years ago

Hello Sam, I downloaded the file, run it, and I am experiencing the same error. I thought it might be something with python 2 since I am working on a HPC, with py2 installed in a conda environment. I tried py3 also, and the error persists. Also, immediately after I run the DIAMOND_subsystems_analysis_counter.py script, it deletes itself and I have to copy/redownload it, something I have never experienced before. Could this be related to why it is not working? Thanks.

@.***

On Thu, Jul 15, 2021 at 9:11 AM

@.***

On Wed, Jul 14, 2021 at 9:14 PM Sam Westreich @.***> wrote:

Got it, thanks. I just re-downloaded using the command:

wget "https://zenodo.org/record/5022377/files/subsys_db.fa.bz2" --no-check-certificate bunzip2 subsys_db.fa.bz2

and I see the proper lines (no extra line breaks). Can you try re-downloading this database with the link here and see if you still get the same error?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/transcript/samsa2/issues/65#issuecomment-880107039, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANFWMMSJA6I6CDJSRLR4DCLTXXHZVANCNFSM5ALZIB4A .

cbeekman commented 1 year ago

I am getting the same error at the top of this thread also at line 133. Though I do not have the issue with the script getting deleted.

transcript commented 1 year ago

@cbeekman Are you also using the default downloaded Subsystems database with no modifications?

One quick test: if you modify line 133 of DIAMOND_subsystems_analysis_counter.py to be:

if "NO HIERARCHY" in splitline:

Do you still get the same error?

cbeekman commented 1 year ago

Hi,

Thanks for quick reply. You can disregard my post I since realized I made a mistake. The issue was that I was directing the script to the diamond indexed version instead of the fasta file version of the database.

Thanks, Chapman

transcript commented 1 year ago

Okay, great, glad to hear!