看着14个肿瘤相关通路活性就够了 by 生信技能树

Title：PROGENy: Pathway RespOnsive GENes for activity inference; 通路反应基因活性推断工具

1.引入

在上篇推送介绍的R包中，我们用到了PROGENy工具分析14个通路的活性，简单提了一下PROGENy的github 地址，但并没有做过多探索，在后续的学习中，发现PROGENy的学习文档写的也很……就不是很能懂。由此，本篇就PROGENy 进行简要探索。

2. 介绍

PROGENy （Pathway RespOnsive GENes for activity inference）是2018年发表在Nature Communication的R包[1]。一般认为，基因表达与通路的信号活性相关，在以往的通路富集分析中，基本上都以此为依据，认为在某个通路中高表达的基因越多，则通路激活的可能性越大，然而，却忽略了翻译后修饰的影响，基于此，作者团队开发了PROGENy, 它可以使用公开可用的信号扰动实验(perturbation experiments)数据来推断出人类的样本通路响应过程中的共同核心基因。这些可与任何统计方法相结合，可用于从bulk RNAseq学中推断通路活性。在作者的Abstract中，其将PROGENy的功能主要总结为以下几点：

(i) Recover the effect of known driver mutations
(ii) Provide or improve strong markers for drug indications
(iii) Distinguish between oncogenic and tumor suppressor pathways for patient survival

PROGENy计算的14种通路简要介绍如下：

Androgen: involved in the growth and development of the male reproductive organs.
EGFR: regulates growth, survival, migration, apoptosis, proliferation, and differentiation in mammalian cells
Estrogen: promotes the growth and development of the female reproductive organs.
Hypoxia: promotes angiogenesis and metabolic reprogramming when O2 levels are low.
JAK-STAT: involved in immunity, cell division, cell death, and tumor formation.
MAPK: integrates external signals and promotes cell growth and proliferation.
NFkB: regulates immune response, cytokine production and cell survival.
p53: regulates cell cycle, apoptosis, DNA repair and tumor suppression.
PI3K: promotes growth and proliferation.
TGFb: involved in development, homeostasis, and repair of most tissues.
TNFa: mediates haematopoiesis, immune surveillance, tumour regression and protection from infection.
Trail: induces apoptosis.
VEGF: mediates angiogenesis, vascular permeability, and cell migration.
WNT: regulates organ morphogenesis during development and tissue repair.

在后续几年内，Christian H. Holland 等人不断拓展PROGENy的使用范围，2019年将PROGENy拓展至小鼠的研究[2]。2020年PROGENy可支持单细胞转录数据的分析[3]。目前PROGENy的原始文献引用量已达154。这些信息表明，PROGENy还是较为被研究者们可以的。

图1 Web of science 检索页面

[1] Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, Garnett MJ, Blüthgen N, Saez-Rodriguez J. 2018. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nature Communications: 10.1038/s41467-017-02391-6
[2] Holland CH, Szalai B, Saez-Rodriguez J. Transfer of regulatory knowledge from human to mouse for functional genomics analysis. Biochim Biophys Acta Gene Regul Mech. 2020;1863(6):194431. doi:10.1016/j.bbagrm.2019.194431
[3] Holland, C.H., Tanevski, J., Perales-Patón, J. et al. Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data. Genome Biol 21, 36 (2020). https://doi.org/10.1186/s13059-020-1949-z

3. PROGENy安装及使用

3.1 安装

截止到本文发布前，PROGENy存储在Bioconductor的版本已经是最新的v1.20版本，和github的版本是一致的。因此，可以通过Bioconductor进行安装，也可以通过github 安装。

##通过Bioconductor安装
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("progeny")

## 通过github安装
devtools::install_github("saezlab/progeny")

3.2 使用

一般而言，作者开发R包都希望其他研究者来使用他的软件，因此都会配套一个说明文档或者说使用文档，告诉我们这个新东西应该怎么去使用。在github 的描述页面会给说明文档的链接，或者就直接写在描述页面。如果存储在bioconductor ，网页中也会有相关的帮助文档链接地址。PROGENy的更新后的操作文档写的好像不是能够很好的理解。以下是Bioconductor（https://www.bioconductor.org/packages/release/bioc/vignettes/progeny/inst/doc/progeny.html）及github（https://github.com/saezlab/progeny/blob/master/vignettes/progeny.Rmd）上帮助文档提供的操作代码

library(progeny)
library(ggplot2)
library(dplyr)
model <- progeny::model_human_full
head(model)
#>       gene pathway      weight     p.value
#> 1     RFC2    EGFR  1.47064662 0.001655275
#> 2    ESRRA    EGFR  0.17858956 0.211837604
#> 3   HNRNPK    EGFR  0.30669860 0.084560061
#> 4     CBX6    EGFR -0.67550734 0.017641119
#> 5   ASRGL1    EGFR -0.25232814 0.295429969
#> 6 FLJ30679    MAPK -0.06047373 0.628461747
# Get top 100 significant genes per pathway
model_100 <- model %>%
  group_by(pathway) %>%
  slice_min(order_by = p.value, n = 200)

# Plot
ggplot(data=model_100, aes(x=weight, color=pathway, fill=pathway)) +
  geom_density() +
  theme(text = element_text(size=12)) +
  facet_wrap(~ pathway, scales='free') +
  xlab('scores') +
  ylab('densities') +
  theme_bw() +
  theme(legend.position = "none")

图2 帮助文档结果展示图

看这个文档，实际上看不大懂这个PROGENy到底应该怎么用的。Google检索发现以前也有人写过相关的操作流程，例如PROGENy 推断通路活性（https://www.jianshu.com/p/6e14680036ad）、单细胞之富集分析-6: PROGENY（https://www.jianshu.com/p/89491b688c80）。前者提供的操作链接已经不可用了，后者的原始文档在目前的bioconductor/github 上也消失了。

而检索较早版本的bioconductor progeny发现，在v1.4版本的说明文档链接还能看到后者的部分操作，但单细胞流程的部分直接丢失了网页，至少现在找不到了，PROGENy pathway signatures（https://www.bioconductor.org/packages/3.8/bioc/vignettes/progeny/inst/doc/progeny.html），而使用v1.4版本的说明文档又发现biomaRt的ID转换因为网络问题，也不大好用。而browseVignettes函数也不能调用PROGENy的demo 文档。基于此，本篇主要基于v1.20 附带的函数帮助文档结合前期的一些操作流程进行介绍。

browseVignettes("progeny")
#> No vignettes found by browseVignettes("progeny")

3.3 Bulk RNAseq数据分析

3.3.1 信号通路活性分析

相较于最初的版本，v1.20 已经附加了demo 的演示数据，因此，在本篇的演示文档中，我们只需要加载PROGENy包中附带的数据即可。如下：

library(progeny)
### 获取包内自带数据
gene_expression <- as.matrix(read.csv(system.file("extdata",
                                                  "human_input.csv", package = "progeny"), row.names = 1))
head(gene_expression)
#>         SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516 SRR1039517 SRR1039520 SRR1039521
#>  ANKRD37    7.108502   7.318073   6.501028   6.603712   6.550726   6.767772   6.720333   6.862744
#>  ANKZF1     9.397687   9.637948   9.134391   9.405355   9.345567   9.548367   9.185797   9.405191
#>  ATP6V1G2   5.302948   5.302948   5.302948   5.302948   5.302948   5.302948   5.302948   5.302948
#>  BMP4      11.543845  12.103400  11.630489  11.380581  10.901069  11.171775  11.205271  11.235885
#>  BMPR2     11.541010  11.537694  11.695743  11.653518  11.769146  11.893831  11.738505  11.764539
#>  BNIP3L    12.847510  12.889814  12.882546  13.116758  12.923894  13.563225  12.734816  13.108777

Bulk RNAseq 的数据只需提供归一化之后的表达矩阵，以行为基因名，列为样本名即可，

# 计算通路活性
pathways <- progeny(gene_expression, scale=TRUE,
                    organism="Human", #如为小鼠，填 "Mouse"
                    top = 100, perm = 1)
head(pathways)[1:5,1:5]
#>               Androgen       EGFR   Estrogen    Hypoxia    JAK-STAT
#>  SRR1039508 -1.2111870 -0.2213137  0.9492413  0.4971532  1.70999798
#>  SRR1039509  0.5193128 -1.1128063 -1.0750417 -0.0351343 -1.37369841
#>  SRR1039512 -0.7562443 -0.1584680  0.7905036 -0.7592593  0.78242868
#>  SRR1039513  0.9281783 -0.7618999 -1.0702726 -0.9091296  0.54820602
#>  SRR1039516 -0.7653795  0.0702440  1.1708203 -0.6315846 -0.02282687

由此，得到了每个样本的各通路活性值，后续参考zip文件附带code，进一步处理。

nrow(pathways) 
#> 8
controls =rep(c(TRUE,FALSE),4)
ctl_mean = apply(pathways[controls,], 2, mean)
ctl_sd = apply(pathways[controls,], 2, sd)
pathways = t(apply(pathways, 1, function(x) x - ctl_mean))
pathways = apply(pathways, 1, function(x) x / ctl_sd)
library(pheatmap)
myColor = colorRampPalette(c("Darkblue", "white","red"))(100)
pheatmap(t(pathways),fontsize=14, show_rownames = F,
         color=myColor, main = "PROGENy", angle_col = 45, treeheight_col = 0,  
         border_color = NA)

图3. Bulk RNAseq分组通路活性热图

3.3.2 两组间通路活性差异分析

在最初的版本的说明中，Bulk RNAseq的文档里面还提供了计算两组间通路活性差异的代码。
简单来说，按上述的demo文件共有8个样本，分成两组，一组为对照（TRUE），一组为实验组（FALSE），假如我要探究某个实验条件对样本的通路活性影响，我可以先计算各样本的通路活性，再利用下面的代码，找到差异最为明显的通路。

library(dplyr)
result = apply(pathways, 1, function(x) {
    broom::tidy(lm(x ~ !controls)) %>%
        filter(term == "!controlsTRUE") %>%
        dplyr::select(-term)
})
res_lm<- mutate(bind_rows(result), pathway=names(result))
res_lm
#>  # A tibble: 14 × 5
#>     estimate std.error statistic    p.value pathway 
#>        <dbl>     <dbl>     <dbl>      <dbl> <chr>   
#>   1   8.45       0.962    8.79   0.000120   Androgen
#>   2  -0.210      2.58    -0.0814 0.938      EGFR    
#>   3 -10.0        0.719  -13.9    0.00000854 Estrogen
#>   4   1.18       1.25     0.944  0.382      Hypoxia 
#>   5  -1.45       0.700   -2.07   0.0844     JAK-STAT
#>   6   1.28       1.68     0.760  0.476      MAPK    
#>   7   0.221      0.689    0.321  0.759      NFkB    
#>   8  -5.43       0.960   -5.66   0.00130    p53     
#>   9  -2.58       3.22    -0.798  0.455      PI3K    
#>  10   2.80       0.878    3.19   0.0188     TGFb    
#>  11   0.336      0.665    0.506  0.631      TNFa    
#>  12   0.840      0.810    1.04   0.340      Trail   
#>  13  -0.0104     0.648   -0.0160 0.988      VEGF    
#>  14  -0.964      0.609   -1.58   0.164      WNT

结果显示，在实验组中 p53 在处理后确实不如处理前活跃。源文档没有提供可视化的方法，但我们可以自行用ggplot2画图.

library(ggplot2)
ggplot(as.data.frame(res_lm),aes(x=pathway,y=statistic))+
  geom_point(aes(size=-log10(p.value),color=pathway))+
  theme_bw()+ylim(c(-15,10))+
  scale_color_manual(values = paletteer::paletteer_d('ggsci::category20_d3',n=14))+
  ggplot2::coord_flip()+theme(legend.position = 'none')

3.3.3 与药物作用相关性分析

在PROGENy以往的说明文档中，还提供了与药物作用的相关性分析，Code很简单，但需要一些数据自行下载。
本部分用到的数据存储在 the Genomics of Drug Sensitivity in Cancer（GDSC）数据库中，简单来说，通过分析细胞系基因表达及药物处理细胞后的表达谱变化，来推测药物的敏感性。这几个文件长什么样可以自己下载探索一下。

# set up a file cache so we download only once
library(BiocFileCache)
bfc = BiocFileCache(".")
# gene expression and drug response
base = "http://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Data/"
paths = bfcrpath(bfc, paste0(base, c("suppData/TableS4A.xlsx",
            "preprocessed/Cell_line_RMA_proc_basalExp.txt.zip")))
# load the downloaded files
drug_table <- readxl::read_excel(paths[1], skip=5, na="NA")
drug_table <- replace(drug_table, is.na(drug_table), 0)
gene_table <- readr::read_tsv(paths[2])

# we need drug response with COSMIC IDs
drug_response <- data.matrix(drug_table[,3:ncol(drug_table)])
rownames(drug_response) <- drug_table[[1]]

# we need genes in rows and samples in columns
gene_expr <- data.matrix(gene_table[,3:ncol(gene_table)])
colnames(gene_expr) <- sub("DATA.", "", colnames(gene_expr), fixed=TRUE)
rownames(gene_expr) <- gene_table$GENE_SYMBOLS

手动下载：
Processed gene expression matrix（http://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Data/preprocessed/Cell_line_RMA_proc_basalExp.txt.zip）
Drug sensitivities（http://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Data/suppData/TableS4A.xlsx）
随后进行PROGENy 运算：

library(progeny)
pathways <- progeny(gene_expr,scale = TRUE, organism = "Human", top = 100,
    perm = 1, verbose = FALSE)

可视化结果：

library(pheatmap)
myColor = colorRampPalette(c("Darkblue", "white","red"))(100)
pheatmap(pathways,fontsize=14, show_rownames = FALSE,
    color=myColor, main = "PROGENy", angle_col = 45, treeheight_col = 0,  
    border_color = NA)

图5 细胞系PROGENy 分析结果示意

假设这里我们要讨论Trametinib与MAPK 通路活性的相关性，我们可以先找到匹配的使用Trametinib处理的细胞及其对照组，再找到MAPK通路，随后进行一个相关性分析。

cell_lines = intersect(rownames(pathways), rownames(drug_response))
trametinib = drug_response[cell_lines, "Trametinib"]

mapk = pathways[cell_lines, "MAPK"]

associations = lm(trametinib ~ mapk)
summary(associations)
#> Call:
#> lm(formula = trametinib ~ mapk)

#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -6.548 -1.255  0.338  1.338  6.819 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -0.94703    0.06442  -14.70   <2e-16 ***
#> mapk        -1.35978    0.06449  -21.09   <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#> Residual standard error: 1.998 on 960 degrees of freedom
#> Multiple R-squared:  0.3165, Adjusted R-squared:  0.3158 
#> F-statistic: 444.6 on 1 and 960 DF,  p-value: < 2.2e-16

根据结果来看，我们发现 MAPK 活性与对 Trametinib 的敏感性密切相关：Pr(>|t|)远小于阈值0.05。

4. Bulk RNAseq 小结

总体上而言，PROGENy 的操作方式很简单，只需要提供一个基因表达矩阵即可，其功能还是很强大的，提供的药物作用分析也可以用在数据挖掘相关的文章中，用以挖掘药物可能作用的靶点。在本篇简要介绍中，并没有过多描述他的算法原理，感兴趣的可以去看原文。对于当前的PROGENy bioconductor版本，初学者找到演示文档是比较困难的，可能由于其更新了新的文档之后将前次的文档覆盖，进而导致之前版本的文档。具体原因不做深究。下一篇推文继续简要介绍PROGENy在单细胞数据中的应用。

接上篇，PROGENy: Pathway RespOnsive GENes for activity inference（一）

Title：PROGENy: Pathway RespOnsive GENes for activity inference; 通路反应基因活性推断工具

在上篇中，主要介绍了PROGENy在Bulk RNA seq数据中的应用，以及PROGENy如何联系药物与信号通路。这篇继续探讨PROGENy在单细胞数据中的应用。

4.0 PROGENy在单细胞转录组中的应用

要运行PROGENy的单细胞版本，首先要获得演示数据。10X genomics 存储的pbmc3k的数据可以直接获得。

因此，我们先对该数据进行前期Seurat 处理 10x genomics pbmc3k存储的链接是https://s3-us-west-2.amazonaws.com/10x.files/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz如果在服务器上使用，可以直接用wget下载：

wget -b -c https://s3-us-west-2.amazonaws.com/10x.files/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz

而后进行Seurat流程：

library(Seurat)
library(ggplot2)
library(tidyr)
library(readr)
library(pheatmap)
library(tibble)
pbmc.data <- 
  Read10X(data.dir = "./filtered_gene_bc_matrices/hg19/")
## Initialize the Seurat object with the raw (non-normalized data).
pbmc <- 
  CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, 
                     min.features = 200)
## Identification of mithocondrial genes
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

## Filtering cells following standard QC criteria.
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & 
    percent.mt < 5)

## Normalizing the data
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", 
    scale.factor = 10000)

pbmc <- NormalizeData(pbmc)

## Identify the 2000 most highly variable genes
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)

## In addition we scale the data
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc), verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:10, verbose = FALSE)
pbmc <- FindClusters(pbmc, resolution = 0.5, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:10,  umap.method = 'uwot', metric='cosine')
DimPlot(pbmc, reduction = "umap")

## Finding differentially expressed features (cluster biomarkers)
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, 
    logfc.threshold = 0.25, verbose = FALSE)

## Assigning cell type identity to clusters
new.cluster.ids <- c("Naive CD4 T", "Memory CD4 T", "CD14+ Mono", "B", "CD8 T", 
    "FCGR3A+ Mono", "NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)

## We create a data frame with the specification of the cells that belong to 
## each cluster to match with the Progeny scores.
CellsClusters <- data.frame(Cell = names(Idents(pbmc)), 
    CellType = as.character(Idents(pbmc)),
    stringsAsFactors = FALSE)
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()

图2 pbmc3k 注释后 umap

在完成了细胞注释之后，可对pbmc3k 的样本进行PROGENy 分析。

## We compute the Progeny activity scores and add them to our Seurat object
## as a new assay called Progeny. 
pbmc <- progeny(pbmc, scale=FALSE, organism="Human", top=500, perm=1, 
    return_assay = TRUE)

## We can now directly apply Seurat functions in our Progeny scores. 
## For instance, we scale the pathway activity scores. 
pbmc <- Seurat::ScaleData(pbmc, assay = "progeny") 

## We transform Progeny scores into a data frame to better handling the results
progeny_scores_df <- 
    as.data.frame(t(GetAssayData(pbmc, slot = "scale.data", 
        assay = "progeny"))) %>%
    rownames_to_column("Cell") %>%
    gather(Pathway, Activity, -Cell) 

## We match Progeny scores with the cell clusters.
progeny_scores_df <- inner_join(progeny_scores_df, CellsClusters)

## We summarize the Progeny scores by cellpopulation
summarized_progeny_scores <- progeny_scores_df %>% 
    group_by(Pathway, CellType) %>%
    summarise(avg = mean(Activity), std = sd(Activity))

## We prepare the data for the plot
summarized_progeny_scores_df <- summarized_progeny_scores %>%
    dplyr::select(-std) %>%   
    spread(Pathway, avg) %>%
    data.frame(row.names = 1, check.names = FALSE, stringsAsFactors = FALSE) 

paletteLength = 100
myColor = colorRampPalette(c("Darkblue", "white","red"))(paletteLength)

progenyBreaks = c(seq(min(summarized_progeny_scores_df), 0, 
                      length.out=ceiling(paletteLength/2) + 1),
                  seq(max(summarized_progeny_scores_df)/paletteLength, 
                      max(summarized_progeny_scores_df), 
                      length.out=floor(paletteLength/2)))
progeny_hmap = pheatmap(t(summarized_progeny_scores_df[,-1]),fontsize=14, 
                        fontsize_row = 10, 
                        color=myColor, breaks = progenyBreaks, 
                        main = "PROGENy (500)", angle_col = 45,
                        treeheight_col = 0,  border_color = NA)

图3 PROGENy 单细胞结果展示

5. 一些函数的使用

在v1.20版本里面，除progeny 函数之外，PROGENy包也存在其他函数，如下：

但我们在使用过程中并没有太多的用到这些函数，progeny函数是这个包的核心，那么其他的函数是用来干嘛的呢？

查看progeny 的源码，我们发现，getModel函数是从作者提供的Human 及Mouse 中取出目标通路的基因集,

progeny = function(expr, scale=TRUE, organism="Human", top = 100, perm = 1, 
    verbose = FALSE, z_scores = FALSE, get_nulldist = FALSE, assay_name = "RNA", 
    return_assay = FALSE, ...) {
    UseMethod("progeny")
}
methods("progeny")
#> [1] progeny.default*              progeny.ExpressionSet*        progeny.matrix*               progeny.Seurat*               progeny.SingleCellExperiment*
#> see '?methods' for accessing help and source code
getAnywhere(progeny.matrix)
#> A single object matching ‘progeny.matrix’ was found
#> It was found in the following places
#>   registered S3 method for progeny from namespace progeny
#>   namespace:progeny

其他函数可自行在帮助文档查阅各自功能。

附：当我们找不到说明文档怎么办？

在本软件学习过程中，遇到最明显的bug 就是作者在更新PROGENy之后，以往的帮助文档网页丢失了，这给学习使用R包的我们带来麻烦，这里简要分享我如何寻找这篇文章的帮助文档。

Bioconductor

一般情况下，Bioconductor R包在这个部分会提供说明文档的链接，点开HTML/R_Script 或者查看PDF都可以看到帮助文档，在这里，HTML（https://www.bioconductor.org/packages/release/bioc/vignettes/progeny/inst/doc/progeny.html）并没有太多使用信息，只是介绍了getModel 是怎么用的，而PDF（https://www.bioconductor.org/packages/release/bioc/manuals/progeny/man/progeny.pdf）也只是提供了各个函数的介绍。也就是说，从这里并没有获取到很有价值的信息。图4. PROGENy Bioconductor 界面

图4. PROGENy Bioconductor 界面

参照Bioconductor的获取帮助文档的方式也失败了。

browseVignettes("progeny")
#> No vignettes found by browseVignettes("progeny")

那其实也可以检索一下旧版本的progeny 有没有配套的使用说明。但如果你也进行了这一步，你会发现，这个链接点开啥也没有，而在其他版本的HTML文件中，使用信息也不是那么全。

Github

常见的说明文档也会提供在Github （https://github.com/saezlab/progeny）的首页，但在这里并没有。

因此，下一步查看他的各个文件夹中里面有没有帮助文档。一般来说vignettes 文件夹是储存帮助文档的，感兴趣的可以试着点开看一下

我这里直接说结果，在docs 的下级目录中找到了这两个文件。如果你在github中打开，要是像我一样小白的话，可能看不懂它网页的源代码格式的数据，因此，就直接把他download下来，用本地浏览器打开，你会发现这就是一份操作说明。

结语

总体而言，PROGENy 是个很优秀的R包，但也有其不足的地方，比如，它只涵盖了14条通路。上篇写完才发现，原来已经有其他公众号上已经推送过PROGENy，还引用了文献的例子来介绍PROGENy的使用，读者可以互相参考借鉴，由于来源一致，大部分内容可能会比较趋同。如果对R包原理感兴趣，还是要移步作者原文仔细详读才行哦。

[1] Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, Garnett MJ, Blüthgen N, Saez-Rodriguez J. 2018. Perturbation-response genes reveal signaling footprints in cancer gene expression. *Nature Communications*: [10.1038/s41467-017-02391-6](https://doi.org/10.1038/s41467-017-02391-6)

[2] Holland CH, Szalai B, Saez-Rodriguez J. Transfer of regulatory knowledge from human to mouse for functional genomics analysis. Biochim Biophys Acta Gene Regul Mech. 2020;1863(6):194431. doi:10.1016/j.bbagrm.2019.194431 

[3] Holland, C.H., Tanevski, J., Perales-Patón, J. et al. Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data. Genome Biol 21, 36 (2020). https://doi.org/10.1186/s13059-020-1949-z

ixxmu / mp_duty

看着14个肿瘤相关通路活性就够了 #3048