saezlab / footprints

Analysis code for "Perturbation-response genes reveal signaling footprints in cancer gene expression"
Other
20 stars 8 forks source link

Analysis code for: "Perturbation-response genes reveal signaling footprints in cancer gene expression"

Numerous pathway methods have been developed to quantify the signaling state of a cell from gene expression data, usually from the abundance of transcripts of pathway members, and are hence unable to take into account post-translational control of signal transduction. Gene expression signatures of pathway perturbations can capture this, but they are closely tied to the experimental conditions that they were derived from. We overcome both limitations by leveraging a large compendium of publicly available perturbation experiments to define consensus signatures for pathway activity. We find that although individual expression signatures are heterogeneous, there is a common core of responsive genes that describe pathway activation in a wide range of conditions. These signaling footprints better recover pathway activity than existing methods and provide more meaningful associations with (i) known driver mutations in primary tumors, (ii) drug response in cell lines, and (iii) survival in cancer patients, making them more suitable to assess the activity status of signaling pathways.

The corresponding article for this project is available on bioRxiv (pdf).

@article {Schubert-PRGs,
    author = {Schubert, Michael and Klinger, Bertram and Kl{\"u}nemann, Martina and 
              Garnett, Mathew J and Bl{\"u}thgen, Nils and Saez-Rodriguez, Julio},
    title = {Perturbation-response genes reveal signaling footprints in cancer gene expression},
    year = {2016},
    doi = {10.1101/065672},
    publisher = {Cold Spring Harbor Labs Journals},
    URL = {http://biorxiv.org/content/early/2016/07/25/065672},
    eprint = {http://biorxiv.org/content/early/2016/07/25/065672.full.pdf},
    journal = {bioRxiv}
}

Perturbation experiments

The 11 pathways comprised of 208 submissions to ArrayExpress with a total of 580 experiments are available in the index directory in YAML format. We consider the following pathways:

Search terms are supplied in files with name query.txt. Files and experiments we excluded (because of QC failure or we were not sure if the perturbation corresponds to the phenotype we want to observe) are indicated using the .excluded suffix or a commented entry in the file. Reason for exclusion is mentioned in the files.

Z-scores from gene expression data

Scripts to download and transform gene expression data, and to generate z-scores for the perturbation experiments are in the data directory.

We normalized and QC'd each series as whole (normalize_data.r), then assembled the relevant expression data (expr.r) and computed z-scores for each perturbation experiment (zscores.r).

Building the model

The models we built are available in the model directory. The one we used in the publication is called speed_matrix.

For this, we fit a linear model on the z-scores with a binary matrix indicating pathway perturbations (incl. perturbations of multiple pathways) as the independent variable. We select the 100 most significant genes and use their z-scores as coefficients in the model.

Computing pathway scores

Different pathway methods considered

We computed pathway scores for the PRGs, and the corresponding pathways for Gene Ontology and Reactome genesets (using GSVA), Signaling Pathway Impact Analysis (SPIA; article and R package), Pathifier (article, R package), and PARADIGM (article, tool using the TCGA signaling network).

On the perturbation experiments

For the pathway scores derived from perturbations, we used the fold changes (PRGs, Gene Ontology, Reactome) or the basal samples and perturbed samples as control and perturbed, respectively (SPIA, Pathifier).

On primary tumors of TCGA (The Cancer Genome Atlas)

We took TCGA data from firehose.org and computed all pathway scores for primary tumors (01A in barcode) where a tissue-matched normal (11A) was available (required for SPIA and Pathifier).

On cell lines of the GDSC (Genomics of Drug Sensitivity in Cancer)

We took the GDSC data from cancerrxgene.org/gdsc1000 (article), computing pathway scores for each cell line (for SPIA and Pathifier compared to all other cell lines of the same TCGA label).

Analyses

Recall of perturbations (Fig. 2)

Analyses available in analyses/perturbation_recall.

Functional impact of driver mutations (Fig. 3)

Analyses available in analyses/tcga_mutation.

Explaining drug sensitivity (Fig. 4)

Analyses available in analyses/drug_assocs.

Effect on patient survival (Fig. 5)

Analyses available in analyses/tcga_survival.

Supplementary Figures

Auto-generated using knitr, scripts available in the report directory.