Closed ixxmu closed 2 months ago
拒绝千篇一律,对角线渐变火山图提供不一样的数据可视化形式
对角线火山图或者45°倾斜火山图:
inputfile中放入差异筛选结果(.xlsx,如下图demo.xlsx)和样本分组信息表(.csv,见下图group.csv),demo数据中包括每个样本的intensity, 代谢物名称, VIP, P value以及Fold Change信息:
设置差异筛选阈值 Fold change=1.5, p value<0.05, VIP>=1作为差异物筛选标准,设置labelNumber=10代表对上调和下调差异物中可信度最高的前10个进行展示(上调和下调各10个):
rm(list = ls())
library(tidyverse)
#setting threshold
fold_change=1.5
pvalue=0.05
vip=1
labelNumber=10# the numbers of labeled metabolites in plot
读取数据并按照上述阈值对代谢物进行差异筛选,同时计算差异物可信度得分值score:
files.csv <- fs::dir_ls("inputfile",recurse = TRUE,glob = "*.csv")
dat.group <- map_dfc(files.csv,read_csv)
#loading and reshaping data
files <- fs::dir_ls("inputfile",recurse = TRUE,glob = "*.xlsx")
files_names <- str_extract(files,"(?<=\\/).*(?=\\.)")
dat <- map_dfr(files, openxlsx::read.xlsx) %>%
mutate(
log2FC = log2(Fold_Change),
Type = case_when(
VIP >= vip & p_value <= pvalue & log2FC >= log2(fold_change) ~ "up-regulated",
VIP >= vip & p_value <= pvalue & log2FC <= log2(1 / fold_change) ~ "down-regulated",
TRUE ~ "insig"
),
score = sqrt(VIP^2 + log2FC^2 + p_value^2)
) %>%
group_by(Type) %>%
arrange(Type, desc(score)) %>%
mutate(
label = ifelse(
Type %in% c("up-regulated", "down-regulated") & row_number() <= min(
labelNumber, sum(Type %in% c("up-regulated", "down-regulated"))),
metabolites,
""
)
) %>%
ungroup() %>%
mutate(
label = case_when(
Type == "up-regulated"&label!="" ~ paste0("(up) ", label),
Type == "down-regulated"&label!="" ~ paste0("(down) ", label),
TRUE ~ label
)
)
计算每个代谢物的intensity在每个组中的均值:
dat.plot <- dat %>% select(all_of(c("metabolites",dat.group$sample))) %>%
pivot_longer(!metabolites,names_to = "sample",values_to = "intensity") %>%
left_join(dat.group,by = 'sample') %>%
group_by(metabolites,group) %>%
summarise(mean = mean(intensity)) %>%
pivot_wider(names_from = group, values_from = mean,
names_prefix = "mean_", values_fn = list(mean = mean)) %>%
merge(dat,by = 'metabolites')
差异筛选条件作为Title,并加载本地配色方案colorpalettes.RData:
#title
Title <- paste0('Difference threshold: |log2FC|>=',round( log2(fold_change),3),
" & pvalue<",pvalue," & VIP>=",vip)
load("colorpalettes.RData")#color
将筛选阈值作为title,并加载本地配色方案colorpalettes.RData:
#title
Title <- paste0('Difference threshold: |log2FC|>=',round( log2(fold_change),3),
" & pvalue<",pvalue," & VIP>=",vip)
load("colorpalettes.RData")#color
#plotting
volcano <- ggplot(data=dat.plot,aes(x=log10(mean_normal+1),y=log10(mean_disease+1))) +
geom_point(aes(color =score,size=score)) +
scale_color_gradientn(colors =colorpalettes)+
scale_radius(range=c(1,4),guide=NULL)+
labs(x = "log10(Mean of Normal sample)",
y = "log10(Mean of Disease sample)",
title = "Volcano of metabolites",
subtitle = Title) +
theme_classic(base_size = 12) +
ggrepel::geom_text_repel(aes(label = label),
max.overlaps =labelNumber*10,
color = "#3288BD",
vjust = 0.8, size = 2.8, alpha = 0.7) +
theme(plot.title = element_text(size = 15, hjust = 0.5),
plot.subtitle = element_text(size = 10, hjust = 0.5))
#saving
ggsave("outputfile/volcano.png", plot=volcano, width=6, height=5, dpi=300)
ggsave("outputfile/volcano.pdf", plot=volcano, width =6, height = 5,onefile=F)
volcano:
需要注意geom_text_repel()函数中max.overlaps参数的设置,如果太小,可能会造成标签显示不全
对角线火山图适合P值非常小的基因、蛋白或者代谢物较多的数据,但是此时不能将P或者有P参与的评价标准作为颜色或者尺寸映射;
对角线火山图配合渐变效果,也适合差异较少或者强调可信度较高的显著性差异物;
demo数据见Q群570484875
https://mp.weixin.qq.com/s/pkKxhUgGgSC5cS6kbmV-MA