guma44 / GEOparse

Python library to access Gene Expression Omnibus Database (GEO)
BSD 3-Clause "New" or "Revised" License
137 stars 51 forks source link

is there any way to get GSM sample names without full download? #66

Closed tyasird closed 3 years ago

tyasird commented 3 years ago

for instance, I want to get GSE19826 sample names. GEOparse.get_GEO(geo='GSE19826',how='quick') I have changed how variable to quick but It is still downloads full dataset files and it takes time for large datasets. Is there any way to download only sample names and descriptions?

guma44 commented 3 years ago

Hi, this option is valid only for GSM and GPL files. GSE files are fetched directly with URL path. I think I would be able to quickly check and implement this for GSE too.

guma44 commented 3 years ago

Hi, with the change you would only have access to metadata i.e. sumarry, descriptions etc, but gsms attribute would be empty, would that be OK?

tyasird commented 3 years ago

Hi,

I have changed your code like this and its worked for me.

geoparse.py line 183

    elif geotype == "GSE":
        if scope == "self":
            gseurl = (
                "ftp://ftp.ncbi.nlm.nih.gov/geo/"
                "{root}/{range_subdir}/{record}/soft/{record_file}"
            )
            url = gseurl.format(
                root="series",
                range_subdir=range_subdir,
                record=geo,
                record_file="%s_family.soft.gz" % geo,
            )
            filepath = path.join(tmpdir, "{record}_family.soft.gz".format(record=geo))
        elif scope == "gsm":
            gseurl = (
                "http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi"
                "?targ=gsm&acc={record}&form=text&view={how}"
            )
            url = gseurl.format(record=geo, how=how)
            filepath = path.join(tmpdir, "{record}.txt".format(record=geo))

I get sample information like this


    sample_file = geo.get_GEO(geo=gse,how='brief',scope="gsm", destdir="./tmp/")
    samples = []
    for gsm_name, gsm in sample_file.gsms.items():
        samples.append([gsm_name, gsm.get_metadata_attribute('title'), gsm.get_metadata_attribute('characteristics_ch1')[0]])

output

['GSM495051', 'CB2008210-1N', 'tissue type: noncancer tissue'] ['GSM495052', 'CB2008210-1T', 'tissue type: gastric cancer tissue'] ['GSM495053', 'CB2008210-2N', 'tissue type: noncancer tissue'] ['GSM495054', 'CB2008210-2T', 'tissue type: gastric cancer tissue'] ['GSM495055', 'CB2008210-3N', 'tissue type: noncancer tissue'] ['GSM495056', 'CB2008210-3T', 'tissue type: gastric cancer tissue'] ['GSM495057', 'CB2008210-4N', 'tissue type: noncancer tissue'] ['GSM495058', 'CB2008210-4T', 'tissue type: gastric cancer tissue'] ['GSM495059', 'CB2008210-5N', 'tissue type: noncancer tissue'] ['GSM495060', 'CB2008210-5T', 'tissue type: gastric cancer tissue'] ['GSM495061', 'CB2008210-6N', 'tissue type: noncancer tissue'] ['GSM495062', 'CB2008210-6T', 'tissue type: gastric cancer tissue'] ['GSM495063', 'CB2008210-7N', 'tissue type: noncancer tissue'] ['GSM495064', 'CB2008210-7T', 'tissue type: gastric cancer tissue'] ['GSM495065', 'CB2008210-9N', 'tissue type: noncancer tissue'] ['GSM495066', 'CB2008210-9T', 'tissue type: gastric cancer tissue'] ['GSM495067', 'CB2008210-12N', 'tissue type: noncancer tissue'] ['GSM495068', 'CB2008210-12T', 'tissue type: gastric cancer tissue'] ['GSM495069', 'CB2008210-13N', 'tissue type: noncancer tissue'] ['GSM495070', 'CB2008210-13T', 'tissue type: gastric cancer tissue'] ['GSM495071', 'CB2008210-14N', 'tissue type: noncancer tissue'] ['GSM495072', 'CB2008210-14T', 'tissue type: gastric cancer tissue'] ['GSM495073', 'CB2008210-15N', 'tissue type: noncancer tissue'] ['GSM495074', 'CB2008210-15T', 'tissue type: gastric cancer tissue'] ['GSM495075', 'CB2008210-3C', 'tissue type: normal gastric tissue'] ['GSM495076', 'CB2008210-5C', 'tissue type: normal gastric tissue'] ['GSM495077', 'CB2008210-9C', 'tissue type: normal gastric tissue']

I can commit if you want? Thanks for helping me btw.

guma44 commented 3 years ago

Hi, I have similar change now and I am testing. So no need to commit :). I will let you know when I have new version.

tyasird commented 3 years ago

I am happy to hear that, many thanks.