Closed ixxmu closed 1 year ago
ClusterGVis 输入你的表达矩阵,便可以为你聚类,富集分析,最后绘图。对于 bulk-RNA 的表达矩阵比较友好,但是对于单细胞数据来说,还得自己费点功夫去整理表达矩阵以及相关数据。对于大多数单细胞的分析人员就已经望而却步了。不如用 seurat 自带的 doHeatmap 画算了。为了解决这个问题,于是增加了一些修改,使其更方便的对接单细胞的数据。此外还增加了可以改变 cluster 顺序的功能。
增加了 prepareDataFromscRNA 函数来整理准备单细胞的数据,作为 ClusterGVis 下游的输入。
github 链接:
https://github.com/junjunlab/ClusterGVis
测试数据链接:
链接:https://pan.baidu.com/s/1i6aOIGsDKxkmYGcPls9YVg提取码:7htg
重新安装获取新功能:
# Note: please update your ComplexHeatmap to the latest version!
# install.packages("devtools")
devtools::install_github("junjunlab/ClusterGVis")
先给单细胞注释一下 (上游参考自己的或者网上教程,不再赘述) ,然后找个 marker 基因:
library(ClusterGVis)
library(org.Hs.eg.db)
# add cell type
new.cluster.ids <- c("Naive CD4 T", "CD14+ Mono", "Memory CD4 T", "B", "CD8 T", "FCGR3A+ Mono",
"NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
# find markers for every cluster compared to all remaining cells
# report only the positive ones
pbmc.markers.all <- Seurat::FindAllMarkers(pbmc,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25)
# get top 10 genes
pbmc.markers <- pbmc.markers.all %>%
dplyr::group_by(cluster) %>%
dplyr::top_n(n = 20, wt = avg_log2FC)
# check
head(pbmc.markers)
# # A tibble: 6 × 7
# # Groups: cluster [1]
# p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
# <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
# 1 1.81e-144 0.735 1 0.991 2.48e-140 Naive CD4 T RPS12
# 2 5.51e-123 0.744 0.997 0.975 7.55e-119 Naive CD4 T RPS25
# 3 3.66e-117 0.744 0.994 0.971 5.02e-113 Naive CD4 T RPL9
# 4 2.75e-116 0.740 0.996 0.964 3.77e-112 Naive CD4 T RPL31
# 5 3.86e-110 0.779 0.99 0.977 5.29e-106 Naive CD4 T RPS3A
# 6 1.74e-109 1.07 0.897 0.593 2.39e-105 Naive CD4 T LDHB
准备数据:
prepareDataFromscRNA 需要数据你的 seurat 对象及差异结果即可。showAverage 参数设为 TRUE 则表示对 基因细胞亚群一样的细胞取均值进行绘图,否则就是所有细胞进行绘图。默认使用 seurat 对象的 RNA assay 的 data 数据。
# prepare data from seurat object
st.data <- prepareDataFromscRNA(object = pbmc,
diffData = pbmc.markers,
showAverage = TRUE)
# check
str(st.data)
# List of 5
# $ wide.res:'data.frame': 147 obs. of 11 variables:
# ..$ gene : chr [1:147] "RPS12" "RPS25" "RPL9" "RPL31" ...
# ..$ Naive CD4 T : num [1:147] 1.64 1.57 1.48 1.63 1.57 ...
# ..$ CD14+ Mono : num [1:147] -0.458 -0.536 -0.746 -0.712 -0.872 ...
# ..$ Memory CD4 T: num [1:147] 1.128 1.135 0.822 0.983 0.759 ...
# ..$ B : num [1:147] 0.517 0.671 0.979 0.671 0.936 ...
# ..$ CD8 T : num [1:147] 0.426 0.417 0.557 0.445 0.397 ...
# ..$ FCGR3A+ Mono: num [1:147] -0.592 -0.627 -0.679 -0.836 -0.776 ...
# ..$ NK : num [1:147] -0.897 -0.768 -0.478 -0.403 -0.251 ...
# ..$ DC : num [1:147] -0.3 -0.371 -0.359 -0.265 -0.2 ...
# ..$ Platelet : num [1:147] -1.46 -1.49 -1.58 -1.51 -1.56 ...
# ..$ cluster : chr [1:147] "1" "1" "1" "1" ...
# $ long.res:'data.frame': 1323 obs. of 5 variables:
# ..$ cluster : chr [1:1323] "1" "1" "1" "1" ...
# ..$ gene : chr [1:1323] "RPS12" "RPS25" "RPL9" "RPL31" ...
# ..$ cell_type : Factor w/ 9 levels "Naive CD4 T",..: 1 1 1 1 1 1 1 1 1 1 ...
# ..$ norm_value : num [1:1323] 1.64 1.57 1.48 1.63 1.57 ...
# ..$ cluster_name: Factor w/ 9 levels "cluster 1 (20)",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ type : chr "scRNAdata"
# $ geneMode: chr "average"
# $ geneType: chr "unique|_"
可以看到整理成 clusterData 函数输出的结果一样。包括一个长数据和一个宽数据的数据框。你也可以拿这个数据去自己绘图。
顺便对 marker 基因做个富集分析,我们 p 值取大一点:
# enrich for clusters
enrich <- enrichCluster(object = st.data,
OrgDb = org.Hs.eg.db,
type = "BP",
organism = "hsa",
pvalueCutoff = 0.5,
topn = 5,
seed = 5201314)
# check
head(enrich)
# group Description pvalue ratio
# GO:0002181 C1 cytoplasmic translation 8.388814e-11 38.88889
# GO:0030217 C1 T cell differentiation 2.220845e-07 33.33333
# GO:0045061 C1 thymic T cell selection 1.103890e-06 16.66667
# GO:0033077 C1 T cell differentiation in thymus 1.163815e-06 22.22222
# GO:0030098 C1 lymphocyte differentiation 1.694887e-06 33.33333
# GO:0032103 C2 positive regulation of response to external stimulus 3.957596e-10 45.00000
画个折线图:
# add gene name
markGenes = unique(pbmc.markers$gene)[sample(1:length(unique(pbmc.markers$gene)),40,
replace = F)]
# line plot
visCluster(object = st.data,
plot.type = "line")
热图:
cluster.order 参数调整聚类顺序。
# heatmap plot
pdf('sc1.pdf',height = 10,width = 6,onefile = F)
visCluster(object = st.data,
plot.type = "heatmap",
column_names_rot = 45,
markGenes = markGenes,
cluster.order = c(1:9))
dev.off()
添加富集注释:
# add bar plot
pdf('sc2.pdf',height = 10,width = 14,onefile = F)
visCluster(object = st.data,
plot.type = "both",
column_names_rot = 45,
show_row_dend = F,
markGenes = markGenes,
markGenes.side = "left",
annoTerm.data = enrich,
line.side = "left",
cluster.order = c(1:9),
go.col = rep(jjAnno::useMyCol("stallion",n = 9),each = 5),
add.bar = T)
dev.off()
你会发现我们每个亚群取了 20,marker 基因绘图,但是从图里看到有些亚群并没有 20 个基因,原因是差异分析时,有多个一样的基因在多个亚群同时是表达差异的。默认会对其进行去重。可能富集的时候会少基因,当然你也可以使用 keep.uniqGene=FALSE 来保留重复基因。
# retain duplicate diff gene in multiple clusters
st.data <- prepareDataFromscRNA(object = pbmc,
diffData = pbmc.markers,
showAverage = TRUE,
keep.uniqGene = FALSE,
sep = "_")
# check
df <- st.data$wide.res
比如 CD3D 这个基因有三个重复,表示在三个亚群都有显著性差异。
# line plot
visCluster(object = st.data,
plot.type = "line")
我们再来看折线图数量就都是 20 了。
这些基因在热图上也都会展示:
# heatmap plot
pdf('sc3.pdf',height = 10,width = 6,onefile = F)
visCluster(object = st.data,
plot.type = "heatmap",
column_names_rot = 45,
markGenes = c("CD3D","CD3D_1","CD3D_2"),
cluster.order = c(1:9))
dev.off()
当然你也可以展示所有细胞基因的表达热图。可能绘图需要一点时间。
# no average cells
pbmc.markers1 <- pbmc.markers.all %>%
dplyr::group_by(cluster) %>%
dplyr::top_n(n = 6, wt = avg_log2FC)
# retain duplicate diff gene in multiple clusters
st.data <- prepareDataFromscRNA(object = pbmc,
diffData = pbmc.markers1,
showAverage = FALSE)
# check
str(st.data)
# List of 4
# $ wide.res:'data.frame': 51 obs. of 2640 variables:
# ..$ gene : chr [1:51] "LEF1" "PRKCQ-AS1" "CD3D" "LDHB" ...
# ..$ AAACGCTGTAGCCA-1|Naive CD4 T : num [1:51] -0.434 -0.439 0.718 -1.314 -0.479 ...
# ..$ AAACTTGATCCAGA-1|Naive CD4 T : num [1:51] -0.434 -0.439 0.763 1.225 2.914 ...
# ..$ AAAGAGACGAGATA-1|Naive CD4 T : num [1:51] -0.434 1.927 1.67 1.044 -0.479 ...
# .. [list output truncated]
# $ long.res:'data.frame': 134538 obs. of 5 variables:
# ..$ cluster : chr [1:134538] "1" "1" "1" "1" ...
# ..$ gene : chr [1:134538] "LEF1" "PRKCQ-AS1" "CD3D" "LDHB" ...
# ..$ cell_type : chr [1:134538] "Naive CD4 T" "Naive CD4 T" "Naive CD4 T" "Naive CD4 T" ...
# ..$ norm_value : num [1:134538] -0.434 -0.439 0.718 -1.314 -0.479 ...
# ..$ cluster_name: Factor w/ 9 levels "cluster 1 (6)",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ type : chr "scRNAdata"
# $ geneMode: chr "all"
# $ geneType: chr "unique|_"
热图:
记得设置 show_column_names = F。
# heatmap plot
pdf('sc4.pdf',height = 10,width = 8,onefile = F)
visCluster(object = st.data,
plot.type = "heatmap",
markGenes = unique(pbmc.markers1$gene),
column_title_rot = 45,
cluster.order = 1:9,
show_column_names = F)
dev.off()
改变亚群顺序和修改注释颜色:
# change celltype order and color
pdf('sc5.pdf',height = 10,width = 8,onefile = F)
visCluster(object = st.data,
plot.type = "heatmap",
markGenes = unique(pbmc.markers1$gene),
column_title_rot = 45,
cluster.order = 1:9,
show_column_names = F,
sample.cell.order = rev(new.cluster.ids),
sample.col = jjAnno::useMyCol("paired",n = 9))
dev.off()
添加富集注释:
# add GO annotation
pdf('sc6.pdf',height = 12,width = 16,onefile = F)
visCluster(object = st.data,
plot.type = "both",
column_title_rot = 45,
markGenes = unique(pbmc.markers1$gene),
markGenes.side = "left",
annoTerm.data = enrich,
show_column_names = F,
line.side = "left",
cluster.order = c(1:9),
add.bar = T)
dev.off()
如果有任何疑问或者建议,欢迎在 github 留言!
欢迎加入生信交流群。加我微信我也拉你进 微信群聊 老俊俊生信交流群 (微信交流群需收取 20 元入群费用,一旦交费,拒不退还!(防止骗子和便于管理)) 。QQ 群可免费加入, 记得进群按格式修改备注哦。
老俊俊微信:
知识星球:
About BioSeqUtils 基因特征序列提取 R 包 BioSeqUtils 关于 chatGPT 无法连接的问题 问问 chatGPT 来提取最长转录本及基因序列 GseaVis 一键对接 GSEA 软件结果并可视化 同学你又在画 GSEA? R 语言中的 S3/S4 类 transPlotR 展示代表性转录本 云雨一下? bugs 报告和修复
https://mp.weixin.qq.com/s/FN1hlOWRWJ6XsmB_LxZpfA