Closed antonkulaga closed 5 years ago
Hi, The problem here is that the entries do not have supplementary files ie. The list of supplementary files is composed of one entry: 'None'. I see you do not want to download SRA so this is not an issue for you.
@guma44 in such case the behavior is not consistent. If for some GSM there are no supplementary files, then it should download nothing and return an empty array of paths. My use-case is using GEOparse to download all avaliable files from GSE for my RNA-Seq pipeline, I do not think that covering everything with try-except blocks or puting a lot of if-else to check if there are any supplementary files is efficient. Returning an empty array looks way more logical...
And this is exactly what try-except block does. As can be seen, no one can guarantee that user will not put some nonsense in the metadata. Thus, the code tries to download it but if it cannot, it adds nothing to the return dictionary. Thus, as a result, is an error logging message and empty dictionary. I can actually change Exception to ValueError to be more strict on what error is caught.
It is not clear why it does not download at least sra when I tell to download supplements. At least sra does exist in GSM1944808
import os
import re
from pprint import pprint
from typing import *
import GEOparse
from GEOparse import *
from GEOparse import utils
from functional import *
gsm = cast(GSM, GEOparse.get_GEO("GSM1944808", destdir="/tmp"))
filetype = 'sra'
keep_sra = True
fastq_dump_options = {
'skip-technical': None,
'clip': None,
'split-files': None,
'readids': None,
'read-filter': 'pass',
'dumpbase': None,
'gzip': None
}
sra_kwargs = {
"keep_sra": keep_sra,
'filetype': filetype,
"fastq_dump_options": fastq_dump_options
}
directory_path = "~/test"
gsm.download_supplementary_files(directory_path, True, "antonkulaga@gmail.com", sra_kwargs)
I get
Downloading NONE to /pipelines/text/Supp_GSM1944808_MG_UKJ_15_190214_1HS_brain/NONE
while I at least expect sra to be downloaded as I sad download_sra=True
SRA, in this case, is not listed in the supplementary files. It is just listed as a relation but not as a supplementary file. This is a separate issue and it would be worth to check relations for that but for now, this functionality does not exist. Overall, it does not say why it did not download SRA because there is no SRA unfortunately :/. Anyway, I think I know how to solve it.
I've tried to download SRA with
path = gsm.download_SRA("antonkulaga@gmail.com", "/pipelines/test", **sra_kwargs)
The sra was downloaded (and saved as Supp_GSM1944808_MG_UKJ_15_190214_1HS_brain ) but path is [] I think the function should return the path to downloaded file(s) instead of an empty array
Yes, that is the bug. download_SRA is working but not as expected. I will fix it.
When I download GSM supplementary files by:
I get the following error