Closed ixxmu closed 3 years ago
这段话摘自gsva包的官方文档
We calculate now GSV A enrichment scores for these gene sets using first the microarray data and then the RNA-seq integer count data. Note that the only requirement to do the latter is to set the argument kcdf=“Poisson” which is “Gaussian” by default. Note, however, that if our RNA-seq derived expression levels would be continous, such as log-CPMs, log-RPKMs or log-TPMs, the the default value of the kcdf argument should remain unchanged.
如果是芯片数据,log2即可用;如果是转录组数据,则count,RPKM,FPKM,TPM皆可用,使用count时须指明参数kcdf=“Poisson”。
我们使用芯片数据GSE42872的表达矩阵咯。
if(!require(tinyarray))devtools::install_github("xjsun1221/tinyarray")
library(tinyarray)
es = geo_download("GSE42872")
split_list(es)
#exp = log2(exp+1)
ids = idmap(gpl)
ids = ids[!duplicated(ids$symbol),]
exp = trans_array(exp,ids)
exp[1:4,1:4]
## GSM1052615 GSM1052616 GSM1052617 GSM1052618
## LINC01128 8.75126 8.61650 8.81149 8.32067
## SAMD11 8.39069 8.52617 8.43338 9.17284
## KLHL17 8.20228 8.30886 8.18518 8.13322
## PLEKHN1 8.41004 8.37679 8.27521 8.34524
library(stringr)
Group = ifelse(str_detect(pd$title,"Control"),"control","treat")
Group = factor(Group,levels = c("control","treat"))
table(Group)
## Group
## control treat
## 3 3
出自这篇文章的TableS6:https://www.sciencedirect.com/science/article/pii/S2211124716317090
里面有28种免疫细胞的基因集。
geneset = rio::import("mmc3.xlsx",skip = 2)
geneset = split(geneset$Metagene,geneset$`Cell type`)
lapply(geneset[1:3], head)
## $`Activated B cell`
## [1] "ADAM28" "CD180" "CD79B" "BLK" "CD19" "MS4A1"
##
## $`Activated CD4 T cell`
## [1] "AIM2" "BIRC3" "BRIP1" "CCL20" "CCL4" "CCL5"
##
## $`Activated CD8 T cell`
## [1] "ADRM1" "AHSA1" "C1GALT1C1" "CCT6B" "CD37" "CD3D"
library(GSVA)
re <- gsva(exp, geneset, method="ssgsea",
mx.diff=FALSE, verbose=FALSE
)
library(pheatmap)
re2 <- as.data.frame(re)
an = data.frame(group = Group,
row.names = colnames(exp))
pheatmap(re2,scale = "row",
show_colnames = F,
annotation_col = an,
color = colorRampPalette(c("navy", "white", "firebrick3"))(100))
我的个人文章从简书全部迁移到了语雀,搜小洁忘了怎么分身即可找到,点击阅读原文可以跳转。如果因为不会R语言导致代码完全看不懂,可以看看下面的几个课程⏬都是滚动开班的。
插个小广告! 生信零基础入门学习小组 生信入门班(四周线上直播课,长期开班) 数据挖掘班(医生/医学生首选,三周线上直播课,长期开班) 21年5月起,数据挖掘线下重启(广州/成都/长沙),欢迎咨询 一起来学单细胞吗? 生信星球答疑公告
https://mp.weixin.qq.com/s/bBfzDHcebzNU9lENBSA2yA