ixxmu / mp_duty

抓取网络文章到github issues保存
https://archives.duty-machine.now.sh/
122 stars 30 forks source link

没想到修个火山图这么麻烦 #3661

Closed ixxmu closed 1 year ago

ixxmu commented 1 year ago

https://mp.weixin.qq.com/s/NaOL9hvshwufVLkiy1aZAw

ixxmu commented 1 year ago

没想到修个火山图这么麻烦 by 生信菜鸟团

看到了曾老师于20年布置的学徒作业~


 

就是需要复现上图~ 草草一看应该是提取原始数据,取差异基因然后绘图吧。

这个文章的补充材料里面给的 是counts和P值...

先做了除法,就是两个实验组标准化counts分别除以除以对照组标准化counts

rm(list=ls())
options(warn = -1)
##结果发现人家可能只是用了除法
library(dplyr)
author.matrix=read.delim(file="nanostring_author_gene.txt", header = T,sep = "\t",quote = "\"",
                         row.names = 1)
colnames(author.matrix)
colnames(author.matrix)=c("Cont","STING.activated","STING.deficient","Cont.vs.activated_Pvalue","Cont.vs.deficient_Pvalue")
##添加除法列,判断谁除谁
author.matrix["Cdh5",] #"Cdh5""STING.activated"高表达,"STING.deficient"低表达
# Cont STING.activated STING.deficient Cont.vs.activated_Pvalue Cont.vs.deficient_Pvalue
# Cdh5 50.954         143.587          34.814                    0.045                    0.014
# Cont作为分母
author.matrix$de.vs.Cont=author.matrix$STING.deficient / author.matrix$Cont
author.matrix$acti.vs.Cont=author.matrix$STING.activated / author.matrix$Cont
author.matrix["Cdh5",]#检验一下
# Cont STING.activated STING.deficient Cont.vs.activated_Pvalue Cont.vs.deficient_Pvalue
# Cdh5 50.954         143.587          34.814                    0.045                    0.014
# de.vs.Cont acti.vs.Cont
# Cdh5  0.6832437     2.817973
 

9.2 整理矩阵

再把两部分数据结合,按照数据来源进行分组

##整理矩阵
draw.data=data.frame(gene=c(rownames(author.matrix),rownames(author.matrix)),
                     fold=c(author.matrix$acti.vs.Cont,author.matrix$de.vs.Cont),
                     p_value=c(author.matrix$Cont.vs.activated_Pvalue,
                               author.matrix$Cont.vs.deficient_Pvalue),
                     group=factor(c(rep(1,750),rep(2,750))))

draw.data[750:751,] # 核对
# gene      fold p_value group
# 750 Map2k2 1.0014670    0.98     1
# 751    Itk 0.6636043    0.19     2
 author.matrix[c(750,1),]
# Cont STING.activated STING.deficient Cont.vs.activated_Pvalue
# Map2k2 481.946         482.653         462.779                     0.98
# Itk      6.531          23.041           4.334                     0.00
# Cont.vs.deficient_Pvalue de.vs.Cont acti.vs.Cont
# Map2k2                    0.395  0.9602300     1.001467
# Itk                       0.190  0.6636043     3.527944
# draw.data$Pvalue_log10=log10(draw.data$p_value)
# draw.data=draw.data[-(1:10),]
# draw.data=draw.data[draw.data$Pvalue_log10 != "-Inf",]
# #save(draw.data,file = "drawdata.Rdata")

最后得到用于绘图的矩阵如下~找到图示基因与计算结果核对,发现能对的上。

10 绘图

10.1 先绘一张基础图

emmmm,外部链接难以复制到公众号内部,请有需要的各位自行检索

问了师弟在网上找了和这个图长的差不多的图像, 

《MAplot转录组差异基因表达展示》

 MAplot转录组差异基因表达展示_maplot r语言_TS的美梦的博客-CSDN博客自己也顺着这线索另外找了教程 

RNA-Seq分析中(二)——用R画带基因名标签的MA图》

RNA-Seq分析中(二)——用R画带基因名标签的MA图 - 知乎 (zhihu.com)另外

上面算是模板吧。

其实引起我最大注意的还是,看到横坐标是科学计数法显示的 

于是就搜索的教程 

《如何使用ggplot更改轴上数字的格式?》

如何使用ggplot更改轴上数字的格式?- 问答 - 腾讯云开发者社区-腾讯云 (tencent.com)发现作者给了个函数

#####科学记数法显示x轴坐标
 fancy_scientific <- function(l) {
   # turn in to character string in scientific notation
   l <- format(l, scientific = TRUE)
   l <- gsub("0e\\+00","0",l)
   # quote the part before the exponent to keep all the digits
   l <- gsub("^(.*)e""'\\1'e", l)
   # remove "+" after exponent, if exists. E.g.: (3x10^+2 -> 3x10^2) 
   l <- gsub("e\\+","e",l)
   # turn the 'e+' into plotmath format
   l <- gsub("e""%*%10^", l)
   # convert 1x10^ or 1.000x10^ -> 10^ 
   l <- gsub("\\'1[\\.0]*\\'\\%\\*\\%""", l)
   # return this as an expression
   parse(text=l)
 }

结合一下, 另外使用scale_x_reverse函数使x轴从大到小,并且使用breaks和labels指定要显示的数值和对应的标签

## 绘图
 library(ggpubr)
 library(ggthemes)
 ggscatter(draw.data, 
           x = "p_value", y = "fold",
           color = "group") + 
   theme_base()+
   scale_x_reverse(breaks = c(1,0.1,0.01,0.001,0.0001,0.00001,0.000001,0.0000001),
                   labels=fancy_scientific)

10.2 加y轴拉伸

然后我发现目标图像的y轴比例是调整过的 于是坐标轴拉伸教程:《R画图y轴范围太大时,如何局部压缩坐标轴?》R画图y轴范围太大时,如何局部压缩坐标轴?- 知乎 (zhihu.com)

文章里作者给了个函数

 ###拉伸缩小坐标的函数
 library(scales)
 squash_axis <- function(from, to, factor) { 
   # Args:
   #   from: left end of the axis
   #   to: right end of the axis
   #   factor: the compression factor of the range [from, to]
   
   trans <- function(x) {    
     # get indices for the relevant regions
     isq <- x > from & x < to
     ito <- x >= to
     
     # apply transformation
     x[isq] <- from + (x[isq] - from)/factor
     x[ito] <- from + (to - from)/factor + (x[ito] - to)
     
     return(x)
   }
   
   inv <- function(x) {
     # get indices for the relevant regions
     isq <- x > from & x < from + (to - from)/factor
     ito <- x >= from + (to - from)/factor
     
     # apply transformation
     x[isq] <- from + (x[isq] - from) * factor
     x[ito] <- to + (x[ito] - (from + (to - from)/factor))
     
     return(x)
   }
   
   # return the transformation
   return(trans_new("squash_axis", trans, inv))
 }

于是我对y轴0.05-1.5范围的坐标轴拉伸了20倍,并且顺带指定了scale_y_continuous函数中的breaks和labels参数 绘图代码如下

 ggscatter(draw.data, 
           x = "p_value", y = "fold",
           color = "group") + 
   theme_base()+
   coord_trans(y = squash_axis(0.05, 1.5, 0.05))+
   scale_y_continuous(breaks = c(0.05,0.5,1,5,50),
                      labels = c(0.05,0.5,1,5,50))+
   scale_x_reverse(breaks = c(1,0.1,0.01,0.001,0.0001,0.00001,0.000001,0.0000001),
                   labels=fancy_scientific)

图像如下~,能看到开始以1为轴进行分布了。很明显这张图的x轴和作者的图还有较大的差距因为人家的x轴长这样,是均匀分布的 我想到一种可能,就是作者自己先计算了log10然后再和横坐标的标记进行对应。虽然显示的是10^-1,但实际上对应的数值是经过计算之后的0.1。

10.3 计算log10,加一列

x轴可能需要的是经过log后的数据,只不过标记的数值用的是科学计数法,相当于加了个标签 那就,调整X轴数据,取个log10,然后标签加成科学计数法 给表达矩阵加一列

 draw.data$Pvalue_log10=log10(draw.data$p_value)
draw.data=draw.data[-(1:10),]
draw.data=draw.data[draw.data$Pvalue_log10 != "-Inf",]
#save(draw.data,file = "drawdata.Rdata")

用于绘图的表达矩阵长这样~看见表达矩阵里面有Inf值,测试了一下虽然加着也能画,但强迫症到最后的最后还是删掉了...

10.4 更换x轴数值,调整x、y轴显示范围

coord_trans函数的ylim、xlim参数

 ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "group") + 
   theme_base()+
   #coord_cartesian(ylim = c(0.05, 50))+
   coord_trans(y = squash_axis(0.05, 1.4, 0.01),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-5))+
   scale_y_continuous(breaks = c(0.05,0.5,1,5,50),
                      labels = c(0.05,0.5,1,5,50))+
   scale_x_reverse(breaks = c(0,-1,-2,-3,-4,-5,-6),
                   labels=c(expression(10^-0),expression(10^-1),
                            expression(10^-2),expression(10^-3),
                            expression(10^-4),expression(10^-5),
                            expression(10^-5)))

这下X轴标签对了 看到整体上面有一团点 我觉得可能是因为excle的表格数值精度有限,所以计算出来的P值的log10值对应的也是非常密集。从10.3部分可以看到有一堆-0.3000000,所以我觉得是正常的。

10.5 调整Y轴,更换图像主题

 ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "group") + 
   geom_point(alpha=0.1, size=0.5, shape=2,
              aes(color=group)) +
   #scale_color_manual(values=c("blue","red"))+
   theme_classic()+
   #coord_cartesian(ylim = c(0.05, 50))+
   coord_trans(y = squash_axis(0.05, 2.5, 0.008),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-5))+
   scale_y_continuous(breaks = c(0.05,0.5,1,5,50),
                      labels = c(0.05,0.5,1,5,50))+
   scale_x_reverse(breaks = c(0,-1,-2,-3,-4,-5,-6),
                   labels=c(expression(10^-0),expression(10^-1),
                            expression(10^-2),expression(10^-3),
                            expression(10^-4),expression(10^-5),
                            expression(10^-5)))

差不多,加个注释线

10.6 添加注释线

 ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "group") + 
   geom_point(alpha=0.1, size=0.5, shape=2,
              aes(color=group)) +
   #scale_color_manual(values=c("blue","red"))+
   theme_classic()+
   #coord_cartesian(ylim = c(0.05, 50))+
   coord_trans(y = squash_axis(0.05, 2, 0.01),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-5))+
   scale_y_continuous(breaks = c(0.05,0.5,1,5,50),
                      labels = c(0.05,0.5,1,5,50))+
   scale_x_reverse(breaks = c(0,-1,-2,-3,-4,-5,-6),
                   labels=c(expression(10^-0),expression(10^-1),
                            expression(10^-2),expression(10^-3),
                            expression(10^-4),expression(10^-5),
                            expression(10^-5)))+
   geom_vline(xintercept=-1.5,lty=4,col="grey",linewidth=0.8)+
   geom_hline(yintercept = 1,lty=1,col="black",linewidth=0.4)  ##加横线

好的注释线也加好了 继续观察图像 发现作者x\y轴的是相交的,不像我这个还有空余

10.7 让x轴的最小值和y轴相交

教程:《ggplot2中我如何让y轴与x轴相交0?- 优文库》 http://www.uwenku.com/question/p-cmjyklfm-uk.html 设置expand=c(0,0) 调整x\y轴

 ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "group") + 
   geom_point(alpha=0.1, size=0.5, shape=2,
              aes(color=group)) +
   #scale_color_manual(values=c("blue","red"))+
   theme_classic()+
   #coord_cartesian(ylim = c(0.05, 50))+
   coord_trans(y = squash_axis(0.05, 2, 0.01),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-5))+
   scale_y_continuous(breaks = c(0.05,0.5,1,5,50),
                      labels = c(0.05,0.5,1,5,50),expand=c(0,0))+
   scale_x_reverse(breaks = c(0,-1,-2,-3,-4,-5,-6),
                   labels=c(expression(10^-0),expression(10^-1),
                            expression(10^-2),expression(10^-3),
                            expression(10^-4),expression(10^-5),
                            expression(10^-5)),expand=c(0,0))+
   geom_vline(xintercept=-1.5,lty=4,col="grey",linewidth=0.8)+
   geom_hline(yintercept = 1,lty=1,col="black",linewidth=0.4)  ##加横线

到目前已经让x轴和y轴中间没有保留了~ 嗯~长的已经比较像了,稍微美化一下吧

10.8 修改边框、图注、颜色...美化

用到的参考资料 《玩转数据可视化之R语言ggplot2:(七)对图形添加注释和标签(包含标题、坐标轴、参考线和高亮等注释方法)》 https://blog.csdn.net/weixin_45052363/article/details/124153241#74__446 找到了一个设置主题的代码

   theme(axis.ticks.y = element_blank(),
         axis.text.y = element_text(size = 15, colour = "black" ),
         axis.text.x = element_text(size = 15, colour = "black" ),
         axis.ticks=element_blank(),
         axis.title.x=element_blank(),
         axis.title.y=element_blank())

直接加入

 ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "group",ylab="Fold change(vs. PBS)",rug = F,
           shape = 20,size = 3) +
     scale_color_manual(values=c( "#ff4757","#546de5"))+
   theme(axis.ticks.y = element_blank(),
         axis.text.y = element_text(size = 15, colour = "black" ),
         axis.text.x = element_text(size = 15, colour = "black" ),
         axis.ticks=element_blank(),
         axis.title.x=element_blank(),
         axis.title.y=element_blank())+
   coord_trans(y = squash_axis(0.05, 2, 0.01),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-4.1))+
   scale_y_continuous(breaks = c(0.05,0.5,1,5,50),
                      labels = c(0.05,0.5,1,5,50),expand=c(0,0))+
   scale_x_reverse(breaks = c(0,-1,-2,-3),
                   labels=c(expression(10^-0),expression(10^-1),                      expression(10^-2),expression(10^-3)),expand=c(0,0))+geom_vline(xintercept=-1.301029995663981,lty=4,col="grey",linewidth=1)+
geom_hline(yintercept = 1,lty=1,col="black",linewidth=0.4)

好的,是时候加个基因标注了

10.9 加基因标注-geom_text_repel报错处理

需要标注哪些基因我得告诉R 所以先构建了新的矩阵

 library(ggstatsplot)
 library(ggrepel)
 up.article=c( "Angpt1","Pdgfrb","Cdh5","Sell","Ccr7","Cd86")
 down.article=c( "Cdh5","Mcam","Angpt1","Sell","Vegfa")
 up.gene=draw.data[draw.data$fold >1,][draw.data[draw.data$fold >1,]$gene %in% up.article,]
 down.gene=draw.data[draw.data$fold < 1,][draw.data[draw.data$fold < 1,]$gene %in% down.article,]
 diff.gene=rbind(up.gene,down.gene)
 diff.gene=diff.gene[!is.na(diff.gene),]
 diff.gene=na.omit(diff.gene)
 rownames(diff.gene)=1:11

其实之前在火山图里面看到过标注基因的代码 于是上网找教程R语言实战:使用ggplot2包绘制个性化火山图 - 知乎 (zhihu.com)其实找了好几个教程,大佬们给的模板都大差不差,就不太多展示了。模板如下:

geom_text_repel(data = filter(data, abs(logFC) > 1 & -log10(adj.P.Val) > 38),
                  max.overlaps = getOption("ggrepel.max.overlaps", default = 20),
                  aes(label = gene, 
                      color = change),
                  size = 2)

用的是geom_text_repel函数 但...报错了...

Error in `geom_label_repel()`:
! Problem while converting geom to grob.
ℹ Error occurred in the 2nd layer.
Caused by error in `x[isq] <- from + (x[isq] - from) / factor`:
! NAs are not allowed in subscripted assignments
Run `rlang::last_trace()` to see where the error occurred.

网上找了一圈,没有进展 去看geom_text_repel说明书,结合报错里面也有NA看到了这个参数里面有NA 于是就试了试,把xlim和ylim设置了一下

ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "group",ylab="Fold change(vs. PBS)",rug = F,
           shape = 20,size = 3) +
     #geom_point(alpha=0.4, size=3.5) +
     scale_color_manual(values=c( "#ff4757","#546de5"))+
   ##加横线
   geom_label_repel(data = diff.gene, aes(x = diff.gene$Pvalue_log10,
                                          y = diff.gene$fold,
                                          label = gene),
                    size = 3, box.padding = unit(0.4, "lines"),
                    point.padding = unit(0.8, "lines"),
                    segment.color = "black",
                    show.legend = FALSE,
                    na.rm = T,
                    ylim = c(0.05,50),xlim = c(3,4),
                    direction = "y") +
   #directlabels::geom_dl(aes(label = group), method = "smart.grid")+
   #scale_colour_discrete(guide="none")
   theme(axis.ticks.y = element_blank(),
         axis.text.y = element_text(size = 15, colour = "black" ),
         axis.text.x = element_text(size = 15, colour = "black" ),
         axis.ticks=element_blank(),
         axis.title.x=element_blank(),
         axis.title.y=element_blank())+
   coord_trans(y = squash_axis(0.05, 2, 0.01),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-4.1))+
   scale_y_continuous(breaks = c(0.05,0.5,1,5,50),
                      labels = c(0.05,0.5,1,5,50),expand=c(0,0))+
   scale_x_reverse(breaks = c(0,-1,-2,-3),
                   labels=c(expression(10^-0),expression(10^-1),
                   expression(10^-2),expression(10^-3)),expand=c(0,0))+geom_vline(xintercept=-1.301029995663981,lty=4,col="grey",linewidth=1)+
   geom_hline(yintercept = 1,lty=1,col="black",linewidth=0.4)

??x和y轴的名称不知道什么时候没有了,原来是theme这里覆盖了.. 修改代码

ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "group",ylab="Fold change(vs. PBS)",rug = F,
           shape = 20,size = 3) +
     #geom_point(alpha=0.4, size=3.5) +
     scale_color_manual(values=c( "#ff4757","#546de5"))+
   ##加横线
   geom_label_repel(data = diff.gene, aes(x = diff.gene$Pvalue_log10,
                                          y = diff.gene$fold,
                                          label = gene),
                    size = 3, box.padding = unit(0.4, "lines"),
                    point.padding = unit(0.8, "lines"),
                    segment.color = "black",
                    show.legend = FALSE,
                    na.rm = T,
                    ylim = c(0.05,50),xlim = c(3,4),
                    direction = "y") +
   #directlabels::geom_dl(aes(label = group), method = "smart.grid")+
   #scale_colour_discrete(guide="none")
   theme(axis.ticks.y = element_blank(),
         axis.text.y = element_text(size = 15, colour = "black" ),
         axis.text.x = element_text(size = 15, colour = "black" ),
         axis.ticks=element_blank())+
   coord_trans(y = squash_axis(0.05, 2, 0.01),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-4.1))+
   scale_y_continuous(breaks = c(0.05,0.5,1,5,50),
                      labels = c(0.05,0.5,1,5,50),expand=c(0,0))+
   scale_x_reverse(breaks = c(0,-1,-2,-3),
                   labels=c(expression(10^-0),expression(10^-1),
                   expression(10^-2),expression(10^-3)),expand=c(0,0))+geom_vline(xintercept=-1.301029995663981,lty=4,col="grey",linewidth=1)+
   geom_hline(yintercept = 1,lty=1,col="black",linewidth=0.4)

但我发现我的刻度线也不见了,就上面这个家伙... 继续搜索教程~

10.10 修改坐标轴刻度

让我康康这样的刻度要怎么才能让它再度出现, 上网搜,教程如下 https://blog.csdn.net/qq_42458954/article/details/112604443 《6.7 坐标轴:移除刻度标签、刻度线和主网格线(关于刻度的逻辑)》 https://zhuanlan.zhihu.com/p/111896783 找到个具体名称介绍的但搜到的资料显示,通过主题的修改,刻度标签和刻度线总是需要同时出现的.. 那么,作者是如何做到部分有刻度线但没有刻度标签的.. 修改的时候发现自己设置的主题是没有刻度线的,于是修改了主题 难道...等等我有一个大胆的猜测在设置scale_y_continuous,scale_x_reverse两个函数的时候label的可以直接为空?试一下

 ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "group",ylab="Fold change(vs. PBS)",xlab = "P-value",rug = F,
           shape = 20,size = 3) +
     scale_color_manual(values=c( "#ff4757","#546de5"))+
   geom_label_repel(data = diff.gene, aes(x = diff.gene$Pvalue_log10,
                                          y = diff.gene$fold,
                                          label = gene),
                    size = 3, box.padding = unit(0.4, "lines"),
                    point.padding = unit(0.8, "lines"),
                    segment.color = "black",
                    show.legend = FALSE,
                    na.rm = T,
                    ylim = c(0.05,50),xlim = c(3,4),
                    direction = "y") +
   theme_classic()+
   theme(
         axis.text.y = element_text(size = 10, colour = "black" ),
         axis.text.x = element_text(size = 10, colour = "black" ))+
   coord_trans(y = squash_axis(0.05, 2, 0.01),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-4.1))+
   scale_y_continuous(breaks = c(0.05,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,
                                 1,5,50),
                      labels = c(0.05,"","","","",0.5,"","",'','',
                                 1,5,50),expand=c(0,0))+
   scale_x_reverse(breaks = c(0,-0.1,-0.2,-0.3,-0.4,-0.5,-0.6,-0.7,-0.8,
                              -1,-2,-3),
                   labels=c(expression(10^-0),'','','','','','','','',
                            expression(10^-1),
                            expression(10^-2),expression(10^-3)),expand=c(0,0))+geom_vline(xintercept=-1.301029995663981,lty=4,col="grey",linewidth=1)+
   geom_hline(yintercept = 1,lty=1,col="black",linewidth=0.4)

好耶,刻度线这不就加上了嘛 是时候删除右边的图例了..

10.11 删图例,加注释(分组\P值)

删除图例教程:

https://blog.csdn.net/LeaningR/article/details/114576555 legend.title=element_blank() 

但是....没有发现不同的分组可以用不同颜色的字来表示。

感觉分组是作者另外加的注释 

找到的教程:《ggplot中的注释图层annotate》

ggplot中的注释图层

annotate_zoujiahui_2018的博客-CSDN博客《R语言ggplot2包之注释》

R语言ggplot2包之注释_r语言 ggplot annotate parse_zx403413599的博客-CSDN博客

P值的注释是竖的,所以,搜到的教程~ 

修改注释字的角度 

《如何在ggplot2中旋转图中的注释文本(附实例)》 https://juejin.cn/post/7117025583959113764 

而且这个P一看就是斜体,所以还要知道怎么样设置字体 

设置字体 

《ggplot2|theme主题设置,详解绘图优化-“精雕细琢”》

ggplot2|theme主题设置,详解绘图优化-“精雕细琢” - 知乎 (zhihu.com)

分组的注释是两行的,我该如何表示换行?

于是搜索的教程:双""中间表示换行 《 R中转义符》 https://www.douban.com/note/512515819/?_i=9842178-fjctlC

综合以上的信息 我设置了三个annotate函数

ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "group",ylab="Fold change(vs. PBS)",xlab = "P-value",rug = F,
           shape = 20,size = 3) +
     scale_color_manual(values=c( "#ff4757","#546de5"))+
   geom_label_repel(data = diff.gene, aes(x = diff.gene$Pvalue_log10,
                                          y = diff.gene$fold,
                                          label = gene),
                    size = 3, box.padding = unit(0.4, "lines"),
                    point.padding = unit(0.8, "lines"),
                    segment.color = "black",
                    show.legend = FALSE,
                    na.rm = T,
                    ylim = c(0.05,50),xlim = c(3,4),
                    direction = "y") +
   theme_classic()+
   theme(axis.text.y = element_text(size = 10, colour = "black" ),
         axis.text.x = element_text(size = 10, colour = "black" ),
         #legend.title=element_blank(),
         legend.position = "none")+
   coord_trans(y = squash_axis(0.05, 2, 0.01),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-6.1))+
   scale_y_continuous(breaks = c(0.05,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,
                                 1,5,50),
                      labels = c(0.05,"","","","",0.5,"","",'','',
                                 1,5,50),expand=c(0,0))+
   scale_x_reverse(breaks = c(0,-0.1,-0.2,-0.3,-0.4,-0.5,-0.6,-0.7,-0.8,
                              -1,-2,-3,-4,-5,-6),
                   labels=c(expression(10^-0),'','','','','','','','',
                            expression(10^-1),
                            expression(10^-2),
                            expression(10^-3),
                            expression(10^-4),
                            expression(10^-5),
                            expression(10^-6)),
                   expand=c(0,0))+geom_vline(xintercept=-1.301029995663981,lty=4,col="grey",linewidth=1)+
   geom_hline(yintercept = 1,lty=1,col="black",linewidth=0.4)+
   annotate('text',x=-5,y=40,label="STING \n activated",color="#ff4757")+
   annotate('text',x=-5,y=0.2,label="STING \n deficient",color="#546de5")+
annotate('text',x=-1,y=34,label="P=0.05",color="black",angle=90,fontface="italic")

得到的图如下:

这个时候跟大家再提一嘴原图,不然可能都忘了..很明显,原图是没有x轴的

10.12 去x轴,加线段

于是又开始搜教程,如何去掉x轴,然后怎么样才能给注释线加上标尺... 找到的教程5.4 添加注释:添加线段(segment) - 知乎 (zhihu.com)《ggplot2作图:隐去坐标轴标签(xlab、ylab)》ggplot2作图:隐去坐标轴标签(xlab、ylab) - 知乎 (zhihu.com)《# ggplot2 line types : How to change line types of a graph in R software?》线型 http://www.sthda.com/english/wiki/ggplot2-line-types-how-to-change-line-types-of-a-graph-in-r-software 综合以上教程,以下代码我一共做了2件事 1-我把theme函数里面所有关于x轴的参数全部设置为空 2-然后我把线段标注当成坐标刻度绘制在注释线上..

 ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "group",ylab="Fold change(vs. PBS)",xlab = "",rug = F,
           shape = 20,size = 3,show.legend.text = T,ggtheme = theme_void()) +
     scale_color_manual(values=c( "#ff4757","#546de5"))+
   ##加标签
   geom_label_repel(data = diff.gene, aes(x = diff.gene$Pvalue_log10,
                                          y = diff.gene$fold,
                                          label = gene),
                    size = 3, box.padding = unit(0.4, "lines"),
                    point.padding = unit(0.8, "lines"),
                    segment.color = "black",
                    show.legend = FALSE,
                    na.rm = T,
                    ylim = c(0.05,50),xlim = c(3,4),
                    direction = "y") +
   theme_classic()+
   theme(#axis.text.y = element_text(size = 10, colour = "black" ),
         #axis.text.x = element_text(size = 10, colour = "black" ),
         #legend.title=element_blank(),
         legend.position = "none",
         axis.ticks.x = element_blank(),
         axis.text.x=element_blank(),
         axis.line.x=element_blank()
         )+
   coord_trans(y = squash_axis(0.05, 2, 0.01),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-6))+
   scale_y_continuous(breaks = c(0.05,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,
                                 1,5,50),
                      labels = c(0.05,"","","","",0.5,"","",'','',
                                 1,5,50),expand=c(0,0))+
   scale_x_reverse(#breaks = c(0,-0.1,-0.2,-0.3,-0.4,-0.5,-0.6,-0.7,-0.8,-1,-2,-3,-4,-5,-6),
                   # labels=c(expression(10^-0),'','','','','','','','',
                   #          expression(10^-1),
                   #          expression(10^-2),
                   #          expression(10^-3),
                   #          expression(10^-4),
                   #          expression(10^-5),
                   #          expression(10^-6)),
                   expand=c(0,0),
                   breaks = NULL)+geom_vline(xintercept=-1.301029995663981,lty=4,col="grey",linewidth=1)+
   geom_hline(yintercept = 1,lty=1,col="black",linewidth=0.4)+
   annotate('text',x=-4.3,y=40,label="STING \n activated",color="#ff4757")+
   annotate('text',x=-4.3,y=0.2,label="STING \n deficient",color="#546de5")+
   annotate('text',x=-1.2,y=34,label="P=0.05",color="black",angle=90,fontface="italic")+
   annotate('segment',x = -0.1, xend = -0.1, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -0.15, xend = -0.15, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -0.22, xend = -0.22, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -0.31, xend = -0.31, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -0.25, xend = -0.25, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -0.28, xend = -0.28, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-0.2,y=0.94,label=expression(10^-0),color="black",fontface="bold")+
   annotate('text',x=-1.1,y=0.94,label=expression(10^-1),color="black",fontface="bold")+
   annotate('segment',x = -1, xend = -1, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -1.2, xend = -1.2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-2.1,y=0.94,label=expression(10^-2),color="black",fontface="bold")+
   annotate('segment',x = -2, xend = -2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -2.2, xend = -2.2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-3.1,y=0.94,label=expression(10^-3),color="black",fontface="bold")+
   annotate('segment',x = -3, xend = -3, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -3.2, xend = -3.2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-4.1,y=0.94,label=expression(10^-4),color="black",fontface="bold")+
   annotate('segment',x = -4, xend = -4, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -4.2, xend = -4.2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-5.1,y=0.94,label=expression(10^-5),color="black",fontface="bold")+
   annotate('segment',x = -5, xend = -5, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -5.2, xend = -5.2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-5.85,y=0.94,label=expression(10^-6),color="black",fontface="bold")+
   annotate('segment',x = -6, xend = -6, y = 0.98, yend = 1,color="black",size=0.001)

终于..标上了 再对照了一下作者的图,发现颜色是渐变的

10.13 加渐变色...

...看起来是渐变色,其实是4个颜色.. 于是就先先添加用于分组的列

 draw.data$color_4 = ifelse(draw.data$fold >= 1 & draw.data$p_value < 0.05,"goodup",
                             ifelse(draw.data$fold >= 1 & draw.data$p_value >= 0.05,"up",
                                    ifelse(draw.data$fold <= 1 & draw.data$p_value >= 0.05,"down",
                                           "gooddown")))
head(draw.data)
 # gene      fold p_value           group Pvalue_log10  color_4
 # 11   Klrd1 2.9291279   0.001 STING activated           -3   goodup
 # 12 Tnfsf11 2.2882951   0.001 STING activated           -3   goodup
 # 13    Gbp5 4.3578296   0.001 STING activated           -3   goodup
 # 14   Pdcd1 2.9417155   0.001 STING activated           -3   goodup
 # 15    Cd40 3.2617735   0.001 STING activated           -3   goodup
 # 16   Birc5 0.7692822   0.001 STING activated           -3 gooddown

修改分组列 修改scale_color_manual函数

ggscatter(draw.data, 
           x = "Pvalue_log10", y = "fold",
           color = "color_4",ylab="Fold change(vs. PBS)",xlab = "",rug = F,
           shape = 20,size = 3,show.legend.text = T,ggtheme = theme_void()) +
     scale_color_manual(values=c( "#96a6ef","#546de5","#ff4757","#ffbdc2"))+
   ##加标签
   geom_label_repel(data = diff.gene, aes(x = diff.gene$Pvalue_log10,
                                          y = diff.gene$fold,
                                          label = gene),
                    size = 3, box.padding = unit(0.4, "lines"),
                    point.padding = unit(0.8, "lines"),
                    segment.color = "black",
                    show.legend = FALSE,
                    na.rm = T,
                    ylim = c(0.05,50),xlim = c(3,4),
                    direction = "y") +
   theme_classic()+
   theme(#axis.text.y = element_text(size = 10, colour = "black" ),
         #axis.text.x = element_text(size = 10, colour = "black" ),
         #legend.title=element_blank(),
         legend.position = "none",
         axis.ticks.x = element_blank(),
         axis.text.x=element_blank(),
         axis.line.x=element_blank()
         )+
   coord_trans(y = squash_axis(0.05, 2, 0.01),x=squash_axis(0,3,1),
               ylim = c(0.05,50),xlim = c(0,-6))+
   scale_y_continuous(breaks = c(0.05,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,
                                 1,5,50),
                      labels = c(0.05,"","","","",0.5,"","",'','',
                                 1,5,50),expand=c(0,0))+
   scale_x_reverse(#breaks = c(0,-0.1,-0.2,-0.3,-0.4,-0.5,-0.6,-0.7,-0.8,-1,-2,-3,-4,-5,-6),
                   # labels=c(expression(10^-0),'','','','','','','','',
                   #          expression(10^-1),
                   #          expression(10^-2),
                   #          expression(10^-3),
                   #          expression(10^-4),
                   #          expression(10^-5),
                   #          expression(10^-6)),
                   expand=c(0,0),
                   breaks = NULL)+
   geom_vline(xintercept=-1.301029995663981,lty=4,col="grey",linewidth=1)+
   geom_hline(yintercept = 1,lty=1,col="black",linewidth=0.4)+
   annotate('text',x=-4.3,y=40,label="STING \n activated",color="#ff4757")+
   annotate('text',x=-4.3,y=0.2,label="STING \n deficient",color="#546de5")+
   annotate('text',x=-1.2,y=34,label="P=0.05",color="black",angle=90,fontface="italic")+
   annotate('text',x=-5.5,y=1.099,label="P-value",color="black",fontface="italic")+
   annotate('segment',x = -0.1, xend = -0.1, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -0.15, xend = -0.15, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -0.22, xend = -0.22, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -0.31, xend = -0.31, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -0.25, xend = -0.25, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -0.28, xend = -0.28, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-0.2,y=0.94,label=expression(10^-0),color="black",fontface="bold")+
   annotate('text',x=-1.1,y=0.94,label=expression(10^-1),color="black",fontface="bold")+
   annotate('segment',x = -1, xend = -1, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -1.2, xend = -1.2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-2.1,y=0.94,label=expression(10^-2),color="black",fontface="bold")+
   annotate('segment',x = -2, xend = -2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -2.2, xend = -2.2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-3.1,y=0.94,label=expression(10^-3),color="black",fontface="bold")+
   annotate('segment',x = -3, xend = -3, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -3.2, xend = -3.2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-4.1,y=0.94,label=expression(10^-4),color="black",fontface="bold")+
   annotate('segment',x = -4, xend = -4, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -4.2, xend = -4.2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-5.1,y=0.94,label=expression(10^-5),color="black",fontface="bold")+
   annotate('segment',x = -5, xend = -5, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('segment',x = -5.2, xend = -5.2, y = 0.98, yend = 1,color="black",size=0.001)+
   annotate('text',x=-5.85,y=0.94,label=expression(10^-6),color="black",fontface="bold")+
   annotate('segment',x = -6, xend = -6, y = 0.98, yend = 1,color="black",size=0.001)

终于...

展示一下作者的图和我的图~基本上吧,最后x轴的上移还是用代码实现了,要说细节肯定还有很多没有修改

11 如果使用差异基因作图呢?

再次回到知乎文章,发现作者其实还提供了另一个已经标准化过的矩阵,我想,他用于绘图的数据不会就是平均标准化后的数据得到的吧..打算试一下A2m这个基因 用于绘图的值是6.128我平均一下标准化之后的值, 得到..我猜可能是因为精度的缘故,所以有点差异.. 

另外试了几个..也是同样的情况

破案了,作者的数据来源就是先给矩阵做了标准化然后再使用倍数变化绘图.. 大概理解了文章里面的描述那y轴如果是logFC的话,这张图应该长什么样呢?另外花时间试了一下, 这是整理好的矩阵:只修改矩阵和必要参数,结果如下图调整一下代码,还原原本的图像有些P值非常小,所以整张图有点长..

而且,红色一组就很奇怪所有的基因都是上调的?

虽然一共就检测了750个基因,但还是觉得有点奇怪...

可能作者想营造一种两个实验组相较于对照组在基因表达上完全对称的感觉吧。

对比一下原图,整体的基因的变化趋势于文章作者保持一致,但仍旧有差异..

比如CCR7这个基因,在差异分析中呈现不明显的下调,但在作者的图像中呈现明显的上调.. 

两图对比图

12 Raw.data VS Normalized data

总之就是觉得奇怪, 于是最后我还是把作者标准化后的表达矩阵下载下来与原始数据进行对比了...数值一模一样,他给的Raw.data其实并不Raw... 怪不得原始数据的格式不是RCC是txt...

破案了..我整的原来本就不是Raw数据..我至今都不知道这个原数数据长啥样.

还有一个问题,

所以作者补充材料里面的P值是咋来的...??