Open bgyori opened 3 years ago
cBioPortal also contains the CCLE cell line data set which could be used to add expression relations between genes and cell lines, see e.g., https://github.com/sorgerlab/indra/blob/master/indra/databases/context_client.py.
Parts of this were done in #32 but the original idea is not yet integrated.
One approach is to process the raw data into summary statistics of interest. For instance, define a list of disease types and pool all the studies for that particular disease. Then calculate the mutation frequency of genes appearing across all studies for that disease, and create gene-mutated_in (frequency: x%)->disease relations to capture the data.