Closed ixxmu closed 6 months ago
这里有豆豆和花花的学习历程,从新手到进阶,生信路上有你有我!
转录调控网络 (TRN) 由一系列受调控的靶基因和转录因子 (TF) 组成。TF识别特定的DNA序列并影响整个基因组的基因表达,激活或抑制靶基因的表达。
同样是使用browseVignettes("RTN")
可以召唤作者写的网页教程啦。
本文主要介绍的是如何使用RTN包计算给定转录因子的活性分数,可以用于比较不同分组、不同亚型,反应它们之间的差别。
这里使用的是TCGA的CHOLtpm矩阵作为输入数据,可以从xena下载或者使用TCGAbiolinks下载。获取到的数据是ensemblID为行名,用我的trans_exp_new函数即可一键转换为symbol为行名的矩阵,主打一个方便
rm(list = ls())
library(tinyarray)
library(RTN)
load("D:/TCGA_RNA_seq/tpm/chol_tpm.Rdata")
chol_tpm[1:4,1:4]
## TCGA-W5-AA31-11A-11R-A41I-07 TCGA-W5-AA2I-11A-11R-A41I-07
## ENSG00000000003.15 52.4879900 50.755962
## ENSG00000000005.6 0.2202611 0.000000
## ENSG00000000419.13 38.5619360 26.416471
## ENSG00000000457.14 3.5424822 2.108877
## TCGA-W5-AA2T-01A-12R-A41I-07 TCGA-ZH-A8Y5-01A-11R-A41I-07
## ENSG00000000003.15 195.20640279 151.57279
## ENSG00000000005.6 0.05815264 0.00000
## ENSG00000000419.13 106.74049022 132.05190
## ENSG00000000457.14 16.58554100 21.98301
exp = trans_exp_new(chol_tpm)
exp = log2(exp+1)
exp[1:4,1:4]
## TCGA-W5-AA31-11A-11R-A41I-07 TCGA-W5-AA2I-11A-11R-A41I-07
## DDX11L1 0.0000000 0.0000000
## WASH7P 0.3110852 0.4480656
## MIR6859-1 0.0000000 0.0000000
## MIR1302-2HG 0.0000000 0.0000000
## TCGA-W5-AA2T-01A-12R-A41I-07 TCGA-ZH-A8Y5-01A-11R-A41I-07
## DDX11L1 0.000000 0.000000
## WASH7P 2.748028 2.116400
## MIR6859-1 2.870122 2.979347
## MIR1302-2HG 0.000000 0.000000
Group = make_tcga_group(exp)
table(Group)
## Group
## normal tumor
## 9 35
有点单细胞的气质,直接组织成对象。
tfs是自己指定的转录因子,可以查自文献或者转录因子数据库。这里使用示例里的5个。
创建对象需要两个参数,一个是表达矩阵,一个是转录因子
tfs <- c("FOXM1","E2F2","E2F3","RUNX2","PTTG1")
rtni <- tni.constructor(expData = exp,
regulatoryElements = tfs)
## -Preprocessing for input data...
## --Checking 'regulatoryElements' in 'rowAnnotation'...
## --Checking 'expData'...
## --Removing inconsistent data: standard deviation is zero for 10261 gene(s)!
## -Preprocessing complete!
str(rtni,max.level = 2)
## Formal class 'TNI' [package "RTN"] with 10 slots
## ..@ gexp : num [1:50272, 1:44] 0 0.3111 0 0 0.0208 ...
## .. ..- attr(*, "dimnames")=List of 2
## ..@ regulatoryElements: Named chr [1:5] "E2F2" "E2F3" "FOXM1" "PTTG1" ...
## .. ..- attr(*, "names")= chr [1:5] "E2F2" "E2F3" "FOXM1" "PTTG1" ...
## ..@ targetElements : chr [1:50272] "DDX11L1" "WASH7P" "MIR6859-1" "MIR1302-2HG" ...
## ..@ modulators : chr(0)
## ..@ rowAnnotation :'data.frame': 50272 obs. of 1 variable:
## ..@ colAnnotation :'data.frame': 44 obs. of 1 variable:
## ..@ para : list()
## ..@ results : list()
## ..@ summary :List of 4
## ..@ status : Named chr [1:6] "[x]" "[ ]" "[ ]" "[ ]" ...
## .. ..- attr(*, "names")= chr [1:6] "Preprocess" "Permutation" "Bootstrap" "DPI.filter" ...
可以看到gexp里存储了表达矩阵,regulatoryElements里面存储了转录因子
下面两个步骤超级慢所以设置一个存在即跳过
tmpf = "rtni.tmp.Rdata"
if(!file.exists(tmpf)){
# 并返回通过排列分析推断的 TRN(对多重假设检验进行校正)
rtni <- tni.permutation(rtni, nPermutations = 100)
# Please set nPermutations >= 1000,运行超级慢,所以写个100,实战自己改动
# 用bootstrap分析移除不稳定的互作关系,创建参考网络(refnet)
rtni <- tni.bootstrap(rtni) #也是超级慢
save(rtni,file = tmpf)
}
load(tmpf)
names(rtni@results)
## [1] "tn.ref"
这两步运行完可以看到results里面有了refnet
tni.dpi.filter是过滤,保留显性TF-target 对,去除冗余,生成tnet(转录网络)
rtni <- tni.dpi.filter(rtni)
## -Applying dpi filter...
## -DPI filter complete!
names(rtni@results)
## [1] "tn.ref" "tn.dpi"
# 查看网络摘要
tni.regulon.summary(rtni)
## Regulatory network comprised of 5 regulons.
## -- DPI-filtered network:
## regulatoryElements Targets Edges
## 5 7981 9684
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 158 402 1081 1937 3192 4851
## -- Reference network:
## regulatoryElements Targets Edges
## 5 7981 10216
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 323 662 1188 2043 3192 4851
## ---
rtni <- tni.gsea2(rtni)
## -Checking log space... OK!
## -Performing two-tailed GSEA...
## --For 5 regulon(s) and 44 sample(s)...
##
|======================================================================| 100%
## -GSEA2 complete!
str(rtni@results,max.level = 2)
## List of 3
## $ tn.ref : num [1:50272, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "dimnames")=List of 2
## $ tn.dpi : num [1:50272, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "dimnames")=List of 2
## $ regulonActivity:List of 7
## ..$ differential : num [1:44, 1:5] -1.573 -1.583 1.471 1.151 0.263 ...
## .. ..- attr(*, "dimnames")=List of 2
## ..$ positive : num [1:44, 1:5] -0.656 -0.642 0.684 0.38 -0.37 ...
## .. ..- attr(*, "dimnames")=List of 2
## ..$ negative : num [1:44, 1:5] 0.918 0.94 -0.787 -0.771 -0.633 ...
## .. ..- attr(*, "dimnames")=List of 2
## ..$ status : num [1:44, 1:5] -1 -1 1 1 0 -1 0 -1 -1 1 ...
## .. ..- attr(*, "dimnames")=List of 2
## ..$ sections : num 1
## ..$ center : num 0
## ..$ regulatoryElements: Named chr [1:5] "E2F2" "E2F3" "FOXM1" "PTTG1" ...
## .. ..- attr(*, "names")= chr [1:5] "E2F2" "E2F3" "FOXM1" "PTTG1" ...
可以看到计算完后有了regulonActivity
The maximum distance of each running sum from the x-axis represents the enrichment score. GSEA-2T produces two per-phenotype enrichment scores (ES), whose difference (dES = ESpos - ESneg) represents the regulon activity.
differential就是上面说的dES。提取数据画热图即可
regulonActivity <- tni.get(rtni, what = "regulonActivity")
dat <- regulonActivity$differential
dat[1:4,1:4]
## E2F2 E2F3 FOXM1 PTTG1
## TCGA-W5-AA31-11A-11R-A41I-07 -1.5733 -1.6902 -1.6674 -1.6580
## TCGA-W5-AA2I-11A-11R-A41I-07 -1.5827 -1.7051 -1.6789 -1.6593
## TCGA-W5-AA2T-01A-12R-A41I-07 1.4712 0.0236 1.4665 1.5103
## TCGA-ZH-A8Y5-01A-11R-A41I-07 1.1513 0.2754 1.3606 0.7002
draw_heatmap(t(dat),Group,show_rownames = T)
draw_boxplot(t(dat),Group)
如果因为代码看不懂,而跟不上正文的节奏,可以来找我,相当于来一个新手保护期。我的课程都是循环开课。下一期的时间,点进去咨询微信咯 生信分析直播课程 生信新手保护学习小组
预测转录因子活性
https://mp.weixin.qq.com/s/NGBTa-p_pdjdBtwotWdwuQ