ixxmu / mp_duty

抓取网络文章到github issues保存
https://archives.duty-machine.now.sh/
122 stars 30 forks source link

纯ggplot2版|对比基因在33种癌症和GTEx对照中的表达量 #117

Closed ixxmu closed 4 years ago

ixxmu commented 4 years ago

https://mp.weixin.qq.com/s/nkObm5gM7GC-9H1rcGi71A

github-actions[bot] commented 4 years ago

纯ggplot2版|对比基因在33种癌症和GTEx对照中的表达量 by 小丫画图

小伙伴问:感谢小丫带我们打开xena魔盒,有一个问题,FigureYa55pancancer_violin的图能不能画成box plot?

小丫答:可参考FigureYa12box的画法,能让R水平上一个台阶。


问:懒,能不能用它的输出文件easy_input.csv作为输入,直接画出boxplot?想要无缝对接。

答:能。感谢Chris Lou帮忙更新。上次分享ggpubr版,这次分享纯ggplot2的解决方案。


推荐前往链接:https://www.yuque.com/figureya/figureyaplus/figureya55p,下载压缩包(包含代码、输入、输出文件),跑起来更舒服




用ggplot2画带*、带散点的box plot

【优点】自由,想怎么画就怎么画

【缺点】看似复杂


输入文件

easy_input.csv,每行一个sample。第一列组织部位,第二列肿瘤/对照组,第三列基因表达量。

将为每种组织部位计算肿瘤vs.对照之间的p value,在图中用*标注。

  “怎样获得这个输入文件?

FigureYa55pancancer_violin带你从xena下载数据开始,到提取基因在33种癌症中的表达量。不怕TCGA的normal太少,用GTEx的正常组织作为对照,输出easy_input.csv文件

tcga_gtex <- read.csv("easy_input.csv", row.names = 1, header = T, as.is = F)
head(tcga_gtex)
##   tissue type2    tpm
## 1 ACC tumor 4.1327
## 2 ACC tumor 4.9519
## 3 ACC tumor 3.0619
## 4 ACC tumor 2.7051
## 5 ACC tumor 1.9749
## 6 ACC tumor 3.1079

开始画图

library(ggplot2)

ylabname <- paste("TP53", "expression")
colnames(tcga_gtex) <- c("Tissues", "Groups", "Gene")

# 剔除没有normal sample的tissue
tcga_gtex_MESO <- tcga_gtex[tcga_gtex$Tissues=="MESO",]
tcga_gtex_UVM <- tcga_gtex[tcga_gtex$Tissues=="UVM",]
tcga_gtex_withNormal <- tcga_gtex[tcga_gtex$Tissues != "MESO" & tcga_gtex$Tissues != "UVM",]

# 计算p value
pvalues <- sapply(tcga_gtex_withNormal$Tissues, function(x) {
res <- wilcox.test(Gene ~ Groups, data = subset(tcga_gtex_withNormal, Tissues == x)) #两组,wilcox.test或t.test;多组,kruskal.test或aov(one-way ANOVA test)
res$p.value
})
pv <- data.frame(gene = tcga_gtex_withNormal$Tissues, pvalue = pvalues)
pv$sigcode <- cut(pv$pvalue, c(0,0.0001, 0.001, 0.01, 0.05, 1),
labels=c('****','***', '**', '*', 'ns'))

# 画box plot
p.box <- ggplot(tcga_gtex_withNormal, aes(x=Tissues, y=Gene, color=Groups, fill=Groups)) +
geom_boxplot(alpha = .5) + #半透明
theme_classic() + #或theme_bw()
scale_fill_brewer(palette = "Set1") + #按类填充颜色
scale_color_brewer(palette = "Set1") + #按类给边框着色

theme(axis.text.x = element_text(colour="black", size = 11,
                                  
#癌症名太挤,旋转45度
                                   angle = 45, hjust = .5, vjust = .5)) +
geom_text(aes(x=gene, y=max(tcga_gtex_withNormal$Gene) * 1.1,
label = pv$sigcode),
data=pv,
inherit.aes=F) +
ylab(ylabname)
p.box

# 画带散点的box plot
p.box.dot <- p.box + geom_point(shape = 21, size=.5, # 点的形状和大小
position = position_jitterdodge(), # 让点散开
alpha = .5) #半透明
p.box.dot

# 把不带normal的tissue也画上
p.box.dot +
# MESO
geom_boxplot(alpha = .5, data = tcga_gtex_MESO,
mapping = aes(x=Tissues,y=Gene,fill=Groups)) +
geom_point(data = tcga_gtex_MESO,
mapping = aes(x=Tissues,y=Gene,fill=Groups),
shape = 21, size=.5,
position = position_jitterdodge(),
alpha = .5) +
# UVM
geom_boxplot(alpha = .5, data = tcga_gtex_UVM,
mapping = aes(x=Tissues,y=Gene,fill=Groups)) +
geom_point(data = tcga_gtex_UVM,
mapping = aes(x=Tissues,y=Gene,fill=Groups),
shape = 21, size=.5,
position = position_jitterdodge(),
alpha = .5) +
theme_classic() +
theme(axis.text.x=element_text(colour="black", size = 11,
                                   angle = 45, hjust = .5, vjust = .5))


扩展:把表达量画到解剖图上

用这个easy_input.csv文件,还能无缝对接FigureYa78gganatogram(点击左下角“阅读原文”直达),把基因在各组织器官的表达量画到人体解剖图上,就像这样:


一个问题一种解法一篇推文,轻松更,如果你喜欢这种方式,就请点个“在看”鼓励小丫这样更新吧~


回复“群公告”,加入小丫画图群,我们一起画美图~