Closed tyasird closed 3 years ago
Hi, this option is valid only for GSM and GPL files. GSE files are fetched directly with URL path. I think I would be able to quickly check and implement this for GSE too.
Hi, with the change you would only have access to metadata i.e. sumarry, descriptions etc, but gsms
attribute would be empty, would that be OK?
Hi,
I have changed your code like this and its worked for me.
geoparse.py line 183
elif geotype == "GSE":
if scope == "self":
gseurl = (
"ftp://ftp.ncbi.nlm.nih.gov/geo/"
"{root}/{range_subdir}/{record}/soft/{record_file}"
)
url = gseurl.format(
root="series",
range_subdir=range_subdir,
record=geo,
record_file="%s_family.soft.gz" % geo,
)
filepath = path.join(tmpdir, "{record}_family.soft.gz".format(record=geo))
elif scope == "gsm":
gseurl = (
"http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi"
"?targ=gsm&acc={record}&form=text&view={how}"
)
url = gseurl.format(record=geo, how=how)
filepath = path.join(tmpdir, "{record}.txt".format(record=geo))
I get sample information like this
sample_file = geo.get_GEO(geo=gse,how='brief',scope="gsm", destdir="./tmp/")
samples = []
for gsm_name, gsm in sample_file.gsms.items():
samples.append([gsm_name, gsm.get_metadata_attribute('title'), gsm.get_metadata_attribute('characteristics_ch1')[0]])
output
['GSM495051', 'CB2008210-1N', 'tissue type: noncancer tissue'] ['GSM495052', 'CB2008210-1T', 'tissue type: gastric cancer tissue'] ['GSM495053', 'CB2008210-2N', 'tissue type: noncancer tissue'] ['GSM495054', 'CB2008210-2T', 'tissue type: gastric cancer tissue'] ['GSM495055', 'CB2008210-3N', 'tissue type: noncancer tissue'] ['GSM495056', 'CB2008210-3T', 'tissue type: gastric cancer tissue'] ['GSM495057', 'CB2008210-4N', 'tissue type: noncancer tissue'] ['GSM495058', 'CB2008210-4T', 'tissue type: gastric cancer tissue'] ['GSM495059', 'CB2008210-5N', 'tissue type: noncancer tissue'] ['GSM495060', 'CB2008210-5T', 'tissue type: gastric cancer tissue'] ['GSM495061', 'CB2008210-6N', 'tissue type: noncancer tissue'] ['GSM495062', 'CB2008210-6T', 'tissue type: gastric cancer tissue'] ['GSM495063', 'CB2008210-7N', 'tissue type: noncancer tissue'] ['GSM495064', 'CB2008210-7T', 'tissue type: gastric cancer tissue'] ['GSM495065', 'CB2008210-9N', 'tissue type: noncancer tissue'] ['GSM495066', 'CB2008210-9T', 'tissue type: gastric cancer tissue'] ['GSM495067', 'CB2008210-12N', 'tissue type: noncancer tissue'] ['GSM495068', 'CB2008210-12T', 'tissue type: gastric cancer tissue'] ['GSM495069', 'CB2008210-13N', 'tissue type: noncancer tissue'] ['GSM495070', 'CB2008210-13T', 'tissue type: gastric cancer tissue'] ['GSM495071', 'CB2008210-14N', 'tissue type: noncancer tissue'] ['GSM495072', 'CB2008210-14T', 'tissue type: gastric cancer tissue'] ['GSM495073', 'CB2008210-15N', 'tissue type: noncancer tissue'] ['GSM495074', 'CB2008210-15T', 'tissue type: gastric cancer tissue'] ['GSM495075', 'CB2008210-3C', 'tissue type: normal gastric tissue'] ['GSM495076', 'CB2008210-5C', 'tissue type: normal gastric tissue'] ['GSM495077', 'CB2008210-9C', 'tissue type: normal gastric tissue']
I can commit if you want? Thanks for helping me btw.
Hi, I have similar change now and I am testing. So no need to commit :). I will let you know when I have new version.
I am happy to hear that, many thanks.
for instance, I want to get GSE19826 sample names. GEOparse.get_GEO(geo='GSE19826',how='quick') I have changed how variable to quick but It is still downloads full dataset files and it takes time for large datasets. Is there any way to download only sample names and descriptions?