guma44 / GEOparse

Python library to access Gene Expression Omnibus Database (GEO)
BSD 3-Clause "New" or "Revised" License
137 stars 51 forks source link

using merge_and_average with annotation #68

Closed tyasird closed 3 years ago

tyasird commented 3 years ago

Hi,

I want to annotate all samples, pivot_and_annotate function is working well for single platforms.

g = geo.get_GEO(geo='GSE17907', how='full', destdir=download_dir)
g.pivot_and_annotate('VALUE',gse.gpls[list(gse.gpls)[0]],'Gene Symbol')

However, some datasets have multiple platforms. For instance, GSE17907. So, I use merge_and_average function due to platform filter feature. It is good because I am able to get samples for each platform seperately. But unfortunately, merge_and_average does not annotate samples.

gse.merge_and_average(d='GPL570', expression_column='VALUE', gsm_on='ID_REF', gpl_on='ID', group_by_column='ID_REF')

Is there any feature to annotate this multiple platforms? Maybe I missed something so I just wanted to ask it.

btw, I annotate samples manually like this.

soft = gse.gpls[list(gse.gpls)[0]].table
if soft.columns[0] == 'ID' and 'Gene Symbol' in list(soft.columns) and 'ID_REF' == eset.index.name:
    soft = soft[['ID','Gene Symbol']]
pd.merge(left=soft , right=eset, left_on='ID', right_on='ID_REF').drop(['ID'],axis=1)
tyasird commented 3 years ago

I have used another way, thanks.