gyorilab / indra_cogex

INDRA Context Graph Extension
BSD 2-Clause "Simplified" License
7 stars 8 forks source link

Add cBioPortal/TCGA processor #8

Open bgyori opened 3 years ago

bgyori commented 3 years ago

One approach is to process the raw data into summary statistics of interest. For instance, define a list of disease types and pool all the studies for that particular disease. Then calculate the mutation frequency of genes appearing across all studies for that disease, and create gene-mutated_in (frequency: x%)->disease relations to capture the data.

bgyori commented 3 years ago

cBioPortal also contains the CCLE cell line data set which could be used to add expression relations between genes and cell lines, see e.g., https://github.com/sorgerlab/indra/blob/master/indra/databases/context_client.py.

bgyori commented 2 years ago

Parts of this were done in #32 but the original idea is not yet integrated.