ixxmu / mp_duty

抓取网络文章到github issues保存
https://archives.duty-machine.now.sh/
113 stars 30 forks source link

跟着Nature Communications学作图:R语言ggplot2箱线图/抖动散点图展示多物种基因组大小和TE含量 #3833

Closed ixxmu closed 1 year ago

ixxmu commented 1 year ago

https://mp.weixin.qq.com/s/8kk04B3ptyhDLRhcdIIf0g

ixxmu commented 1 year ago

跟着Nature Communications学作图:R语言ggplot2箱线图/抖动散点图展示多物种基因组大小和TE含量 by 小明的数据分析笔记本

论文

Large-scale genome sequencing of mycorrhizal fungi provides insights into the early evolution of symbiotic traits

https://www.nature.com/articles/s41467-020-18795-w

s41467-020-18795-w.pdf

这个是是有读者在公众号后台留言问到

我把论文找来看了一下,论文对应的图大部分都有数据,我们可以试着复现其中的图,先从最简单的的开始,论文中的Figure2是箱线图加抖动散点图,论文的配色也很好看,可以保留作为自己配色备选

部分示例数据截图

image.png

首先是读取数据

library(tidyverse)

dat<-read_delim("data/20230909/Source Data/Source_Data_figure_1a.csv",
           delim = ",")

colnames(dat)


dat %>% 
  pull(Ecology) %>% 
  table()

左侧的图展示基因组大小,代码如下

ggplot(data=dat %>% 
         filter(Ecology!="Yeast"&Ecology!="Parasite") %>% 
       mutate(Ecology=factor(Ecology,levels = c("Wood decayer",
                                                "Endophyte",
                                                "Arbuscular mycorrhizae",
                                                "Orchid mycorrhizae",
                                                "Ericoid mycorrhizae",
                                                "Pathogen",
                                                "Saprotroph",
                                                "Ectomycorrhizae"))),
       aes(x=Genome.size,y=Ecology))+
  geom_boxplot(color="gray")+
  geom_jitter(aes(color=Ecology),
              size=5,
              show.legend = FALSE,
              alpha=0.5)+
  scale_color_manual(values = c("#f1a2c9","#b6b3b3","#a8e3ea",
                                "#fde05f","#f49b40",
                                "#7ac84e","#73a1cb","#e15e53"))+
  scale_x_continuous(limits = c(0,150000000),
                     labels = function(x){x/1000000})+
  theme_bw()+
  theme(panel.border = element_blank(),
        axis.ticks = element_blank())+
  labs(x=NULL,y=NULL,title = "Genomes (Mbp)")
image.png

右侧的图代码基本一样

ggplot(data=dat %>% 
         filter(Ecology!="Yeast"&Ecology!="Parasite") %>% 
         mutate(Ecology=factor(Ecology,levels = c("Wood decayer",
                                                  "Endophyte",
                                                  "Arbuscular mycorrhizae",
                                                  "Orchid mycorrhizae",
                                                  "Ericoid mycorrhizae",
                                                  "Pathogen",
                                                  "Saprotroph",
                                                  "Ectomycorrhizae"))),
       aes(x=TE.CoverageTotal,y=Ecology))+
  geom_boxplot(color="gray")+
  geom_jitter(aes(color=Ecology),
              size=5,
              show.legend = FALSE,
              alpha=0.5)+
  scale_color_manual(values = c("#f1a2c9","#b6b3b3","#a8e3ea",
                                "#fde05f","#f49b40",
                                "#7ac84e","#73a1cb","#e15e53"))+
  scale_x_continuous(limits = c(0,100))+
  theme_bw()+
  theme(panel.border = element_blank(),
        axis.ticks = element_blank(),
        axis.text.y = element_blank())+
  labs(x=NULL,y=NULL,title = "Repeat element coverage (%)")
image.png

最后是拼图

library(patchwork)

p1+p2
image.png

示例数据可以到论文中下载,代码可以在推文中复制,或者给推文打赏一元获取我整理好的数据和代码

欢迎大家关注我的公众号

小明的数据分析笔记本

小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!