Closed Javier-Munoz-Briones closed 4 years ago
Hi there,
I got a chance to take a look, and what I think is happening is some sort of disconnect in Entrez that is returning at least one genome UID that doesn't actually exist, such that the data being requested is short at least one genome, giving rise to the index error that isn't caught. I can do a deeper dive in a couple days and provide a better fix, but if you are in a time crunch and don't care about losing a genome or two, you could consider handling the IndexError to skip over the missing genome data by modifying the for loop like so:
for index in range(0,len(IDs)):
try:
Acc_num = data[index].split(" ")[0][1:]
filename = current_dir+"{0}downloaded_genomes/".format(prefix) + Acc_num.split(".")[0] + ".fasta"
fastanames[Acc_num] = [filename, "lookup","complete"]
with open(filename, "w") as output:
output.write(data[index])
num_downloaded += 1
except IndexError:
break
Unless there is a more systemic issue, this is the most likely solution I'll use if my hunch is correct.
@javobio8
I've been unable to replicate your error on my end. I ran a search for 'Listeria monocytogenes' using:
STSS.py --search "Listeria monocytogenes" -n
Still working, linking group 10 of 327 to nucleotide...
Still working, linking group 20 of 327 to nucleotide...
...
...
Found 3268 genome(s) searching for 'Listeria monocytogenes'.
Downloading unfragmented genome records 1 to 5 of 226
Downloading unfragmented genome records 6 to 10 of 226
....
Downloading fragmented genome records 1 to 10 of 3042
Downloading fragmented genome records 11 to 20 of 3042
...
...
Downloaded 3268 new sequences to analyze for self-targeting sequences.
Searching for CRISPR spacer-repeats...
...
My search returned a smaller number of genomes (3268 vs. 4421 mentioned above), and all of them downloaded without an interrupting error. Can you please copy-paste the exact commands you used and the version of STSS (with STSS.py -v) you are using? And please include if you modified any other lines (search default variables, etc.).
Last, please try two additional things:
Dear Kyle,
Thank you again for your support, I apologize for the late response. I was reviewing why I got the aforementioned error for Listeria monocytogenes when I ran the script for the first time in my workstation. As you mentioned, there is no problem with this species, STSS runs well.
With regard to Pseudomonas aeruginosa, following your suggestion, after including the lines "try" and "except" in the STSS script (line 815), it ran perfectly for Pseudomonas aeruginosa.
Thank you for taking the time to read my comments.
Kind regards, Javier Munoz.
@javobio8
Glad to hear it's working! I'll keep that exception handling in mind to include in the future if I need to patch anything else.
Dear Kyle,
Thank you so much for your last valuable answer. I have used STSS.py to search anti CRISPR in Listeria monocytogenes and Pseudomonas aeruginosa, but it crashes when STSS.py tries to download the genomes. It gave me the following message "IndexError: list index out of range" (line 816 original code). I attached the complete message that I got from the script. I included some "print"s to identify what specific section of line 816 might not be working well, but I still cannot fix it.
Line 816: Acc_num = data[index].split(" ")[0][1:]data[index].split(" ")[0])
With the test case "Moraxella bovoculi", STSS.py runs perfectly.
Thank you again. Best regards, Javier Munoz
Line 818 and the numbers (5/0, 5/1, 5/2, 5/3) is because I include the "print"s in the original code to identify what part of line 816 (original code) might not be working well. """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" Found 4421 genome(s) searching for 'Listeria monocytogenes'. The number of files to download likely exceeds 200 Mb, would you like to continue? (y/n): y Fetching... Downloading unfragmented genome records 1 to 5 of 1429 5 0
5 1
5 2
5 3 Traceback (most recent call last): File "STSS.py", line 2848, in
sys.exit(main())
File "STSS.py", line 2839, in main
protein_list = self_target_search(provided_dir,input_list_file,search,num_limit,E_value_limit,CRT_params,pad_locus,complete_only,skip_PHASTER,percent_reject,default_limit,redownload,current_dir,bin_path,Cas_gene_distance,protein_HMM_file,repeat_HMM_file,prefix,CDD,ask)
File "STSS.py", line 2564, in self_target_search
fastanames,Acc_convert_to_GI = download_genomes(total,num_limit,num_genomes,found_complete,search,redownload,provided_dir,current_dir,found_WGS,complete_IDs,WGS_IDs,wgs_master_GIs,fastanames,ask,prefix)
File "STSS.py", line 818, in download_genomes
print(data[index].split(" ")[0])
IndexError: list index out of range
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""