Numerous pathway methods have been developed to quantify the signaling state of a cell from gene expression data, usually from the abundance of transcripts of pathway members, and are hence unable to take into account post-translational control of signal transduction. Gene expression signatures of pathway perturbations can capture this, but they are closely tied to the experimental conditions that they were derived from. We overcome both limitations by leveraging a large compendium of publicly available perturbation experiments to define consensus signatures for pathway activity. We find that although individual expression signatures are heterogeneous, there is a common core of responsive genes that describe pathway activation in a wide range of conditions. These signaling footprints better recover pathway activity than existing methods and provide more meaningful associations with (i) known driver mutations in primary tumors, (ii) drug response in cell lines, and (iii) survival in cancer patients, making them more suitable to assess the activity status of signaling pathways.
The corresponding article for this project is available on bioRxiv (pdf).
@article {Schubert-PRGs,
author = {Schubert, Michael and Klinger, Bertram and Kl{\"u}nemann, Martina and
Garnett, Mathew J and Bl{\"u}thgen, Nils and Saez-Rodriguez, Julio},
title = {Perturbation-response genes reveal signaling footprints in cancer gene expression},
year = {2016},
doi = {10.1101/065672},
publisher = {Cold Spring Harbor Labs Journals},
URL = {http://biorxiv.org/content/early/2016/07/25/065672},
eprint = {http://biorxiv.org/content/early/2016/07/25/065672.full.pdf},
journal = {bioRxiv}
}
The 11 pathways comprised of 208 submissions to ArrayExpress with a total of 580 experiments are available in the index directory in YAML format. We consider the following pathways:
Search terms are supplied in files with name query.txt
. Files and experiments
we excluded (because of QC failure or we were not sure if the perturbation
corresponds to the phenotype we want to observe) are indicated using the
.excluded
suffix or a commented entry in the file. Reason for exclusion is
mentioned in the files.
Scripts to download and transform gene expression data, and to generate z-scores for the perturbation experiments are in the data directory.
We normalized and QC'd each series as whole (normalize_data.r
), then
assembled the relevant expression data (expr.r
) and computed z-scores for
each perturbation experiment (zscores.r
).
The models we built are available in the model directory. The one we
used in the publication is called speed_matrix
.
For this, we fit a linear model on the z-scores with a binary matrix indicating pathway perturbations (incl. perturbations of multiple pathways) as the independent variable. We select the 100 most significant genes and use their z-scores as coefficients in the model.
We computed pathway scores for the PRGs, and the corresponding pathways for Gene Ontology and Reactome genesets (using GSVA), Signaling Pathway Impact Analysis (SPIA; article and R package), Pathifier (article, R package), and PARADIGM (article, tool using the TCGA signaling network).
For the pathway scores derived from perturbations, we used the fold changes (PRGs, Gene Ontology, Reactome) or the basal samples and perturbed samples as control and perturbed, respectively (SPIA, Pathifier).
We took TCGA data from firehose.org and computed all
pathway scores for primary tumors (01A
in barcode) where a tissue-matched
normal (11A
) was available (required for SPIA and Pathifier).
We took the GDSC data from cancerrxgene.org/gdsc1000 (article), computing pathway scores for each cell line (for SPIA and Pathifier compared to all other cell lines of the same TCGA label).
Analyses available in analyses/perturbation_recall.
Analyses available in analyses/tcga_mutation.
Analyses available in analyses/drug_assocs.
Analyses available in analyses/tcga_survival.
Auto-generated using knitr
, scripts available in the report directory.