Closed ixxmu closed 9 months ago
BioEnricher
包是我的好朋友【医学僧公众号】的博主花了很大功夫写的一个R包,现在无偿的分享给大家~
BioEnricher
的关键在于解决两个问题:
BioEnricher
提供了富集分析的无缝集成,涵盖了各种富集分析基因集数据库,如GO、KEGG、WikiPathways、Reactome、MsigDB、Disease Ontology、Cancer Gene Network、DisGeNET、CellMarker和CMAP(药物);BioEnricher
提供了各种高级的可视化功能(颜值都很高,发表级的可视化方案),简化了数据呈现的过程,使其更快速、更便捷。BioEnricher
的GitHub在https://github.com/Zaoqu-Liu/BioEnricher,记得去官方github给作者点点小星星,支持一波~
目前R包安装暂不支持install_github:
devtools::install_github("Zaoqu-Liu/BioEnricher")
可以自行下载BioEnricher_0.1.0.zip
文件,然后使用本地安装的方式进行安装:
install.packages("~/BioEnricher/BioEnricher-master/BioEnricher_0.1.0.zip", repos = NULL, type = "win.binary")
经常做富集分析的小伙伴都知道,富集分析一般有两种形式:
这里我们加载BioEnricher
包提供的示例数据进行差异分析,得到差异基因列表,为后续的富集分析实战做准备:
library(airway)
library(DESeq2)
library(tidyverse)
library(clusterProfiler)
library(org.Hs.eg.db)
library(BioEnricher)
data(airway)
se <- airway
se$dex <- relevel(se$dex, "untrt")
res <- DESeqDataSet(se, design = ~ cell + dex)%>%
estimateSizeFactors()%>%DESeq()%>%
results()%>%as.data.frame()%>%na.omit()
ann <- bitr(rownames(res),'ENSEMBL','SYMBOL',org.Hs.eg.db)
res <- merge(ann,res,by.x=1,by.y=0)%>%distinct(SYMBOL,.keep_all = T) # Very crude, just as an example
res对象是一个标准的DESeq2差异分析的输出结果,包含所有的基因:
head(res)
# ENSEMBL SYMBOL baseMean log2FoldChange lfcSE stat pvalue padj
# 1 ENSG00000000003 TSPAN6 708.60217 -0.38125389 0.10065443 -3.7877507 1.520173e-04 1.283638e-03
# 2 ENSG00000000419 DPM1 520.29790 0.20681272 0.11221867 1.8429438 6.533721e-02 1.965458e-01
# 3 ENSG00000000457 SCYL3 237.16304 0.03792059 0.14344472 0.2643568 7.915050e-01 9.114580e-01
# 4 ENSG00000000460 C1orf112 57.93263 -0.08816770 0.28714200 -0.3070526 7.588033e-01 8.950344e-01
# 5 ENSG00000000971 CFH 5817.35287 0.42640222 0.08831338 4.8282857 1.377134e-06 1.821495e-05
# 6 ENSG00000001036 FUCA2 1282.10639 -0.24107114 0.08872064 -2.7171933 6.583813e-03 3.277906e-02
# Define an up-regulated gene list
up.genes <- res$SYMBOL[res$log2FoldChange > 2 & res$padj < 0.05]
# Define a down-regulated gene list
down.genes <- res$SYMBOL[res$log2FoldChange < -2 & res$padj < 0.05]
根据这个cutoff进行截断获得数量不等的上调或下调基因:
length(up.genes)
# [1] 128
length(down.genes)
# [1] 107
# Obtain an order ranked geneList.
grlist <- res$log2FoldChange
names(grlist) <- res$SYMBOL
grlist <- sort(grlist,decreasing = T)
grlist是一个按照log2FoldChange排序的基因/log2FoldChange字符串:
head(grlist)
# ALOX15B ZBTB16 GUCY2D STEAP4 LOC101928858 ANGPTL7
# 9.505972 7.352628 5.885112 5.207160 5.098107 5.081572
BioEnricher
纳入了11种基因富集数据库,除了常用的GO,KEGG,Reactome以外,还有:listEnrichMethod()
# "GO", "KEGG", "MKEGG", "WikiPathways", "Reactome", "MsigDB", "DO", "CGN", "DisGeNET", "CellMarker", "CMAP"
lzq_ORA
函数lzq_ORA
将执行基因集富集分析,用户可以指定enrich.type
参数使用其中一种基因富集数据库运行,例如KEGG富集分析:
# Set enrich.type using an enrichment analysis method mentioned above.
kegg <- lzq_ORA(
genes = up.genes,
enrich.type = 'KEGG'
)
KEGG富集分析可以使用pathview
R包进行可视化
res2 <- res[res$log2FoldChange > 0 & res$padj < 0.05,c(2,4)]
res2 <- data.frame(row.names = res2$SYMBOL,R=res2$log2FoldChange)
lzq_KEGGview(gene.data = res2,pathway.id = 'hsa04218')
lzq_ORA.integrated
用户可以使用lzq_ORA.integrated
进行多种数据库的批量分析:
例如对上调的基因进行富集分析:
# Integrative enrichment analysis of the up-regulated gene list
up.enrich <- lzq_ORA.integrated(
genes = up.genes,
background.genes = NULL,
GO.ont = 'ALL',
perform.WikiPathways = T,
perform.Reactome = T,
perform.MsigDB = T,
MsigDB.category = 'ALL',
perform.Cancer.Gene.Network = T,
perform.disease.ontoloty = T,
perform.DisGeNET = T,
perform.CellMarker = T,
perform.CMAP = T,
min.Geneset.Size = 3
)
### This function will output its calculation process.
# +++ Updating gene symbols...
# Maps last updated on: Thu Oct 24 12:31:05 2019
# +++ Transforming SYMBOL to ENTREZID...
# 'select()' returned 1:1 mapping between keys and columns
# +++ Performing GO-ALL enrichment...
# +++ Symplifying GO results...
# +++ Performing KEGG enrichment...
# +++ Performing Module KEGG enrichment...
# +++ Performing WikiPathways enrichment...
# +++ Performing Reactome pathways enrichment...
# +++ Performing Disease Ontoloty enrichment...
# +++ Performing Cancer Gene Network enrichment...
# +++ Performing DisGeNET enrichment...
# +++ Performing CellMarker enrichment...
# +++ Performing MsigDB-ALL enrichment...
# +++ Performing CMAP enrichment...
# +++ 1826 significant terms were detected...
# +++ Done!
对下调的基因进行富集分析:
# Integrative enrichment analysis of the down-regulated gene list
down.enrich <- lzq_ORA.integrated(
genes = down.genes,
background.genes = NULL,
GO.ont = 'ALL',
perform.WikiPathways = T,
perform.Reactome = T,
perform.MsigDB = T,
MsigDB.category = 'ALL',
perform.Cancer.Gene.Network = T,
perform.disease.ontoloty = T,
perform.DisGeNET = T,
perform.CellMarker = T,
perform.CMAP = T,
min.Geneset.Size = 3
)
# +++ Updating gene symbols...
# Maps last updated on: Thu Oct 24 12:31:05 2019
# +++ Transforming SYMBOL to ENTREZID...
# 'select()' returned 1:1 mapping between keys and columns
# +++ Performing GO-ALL enrichment...
# +++ Symplifying GO results...
# +++ Performing KEGG enrichment...
# +++ Performing Module KEGG enrichment...
# +++ Performing WikiPathways enrichment...
# +++ Performing Reactome pathways enrichment...
# +++ Performing Disease Ontoloty enrichment...
# +++ Performing Cancer Gene Network enrichment...
# +++ Performing DisGeNET enrichment...
# +++ Performing CellMarker enrichment...
# +++ Performing MsigDB-ALL enrichment...
# +++ Performing CMAP enrichment...
# +++ 1418 significant terms were detected...
# +++ Done!
生成的富集分析结果是一个list列表:
names(up.enrich)
# [1] "GO" "simplyGO" "KEGG" "Module.KEGG" "WikiPathways"
# [6] "Reactome" "DiseaseOntology" "CancerGeneNetwork" "DisGeNET" "CellMarker.Enrichment"
# [11] "MsigDB.ALL" "CMAP"
BioEnricher
内置了很多高颜值的可视化方案:
barplot
lzq_ORA.barplot1(enrich.obj = up.enrich$simplyGO)
dotplot
lzq_ORA.dotplot1(enrich.obj = up.enrich$simplyGO)
结合上调和下调的通路进行可视化:
lzq_ORA.barplot2(
enrich.obj1 = up.enrich$simplyGO,
enrich.obj2 = down.enrich$simplyGO,
obj.types = c('Up','Down')
)
指定use.Chinese = T
支持中文模式:
Note: use.Chinese exists all the plot functions.
lzq_ORA.barplot2(
enrich.obj1 = up.enrich$simplyGO,
enrich.obj2 = down.enrich$simplyGO,
obj.types = c('Up','Down'),
use.Chinese = T
)
lzq_GSEA
lzq_GSEA
函数将执行基因集富集分析,包括GO, KEGG, WikiPathways, Reactome, MsigDB, Disease ontolty, Cancer Gene Network, DisGeNET, CellMarker和CMAP。同样的,指定enrich.type
参数可指定目标数据库:
# Set enrich.type using an enrichment analysis method mentioned above.
fit <- lzq_GSEA(grlist, enrich.type = 'KEGG')
# +++ Updating gene symbols...
# Maps last updated on: Thu Oct 24 12:31:05 2019
# +++ Transforming SYMBOL to ENTREZID...
# 'select()' returned 1:many mapping between keys and columns
# +++ Performing KEGG enrichment...
# +++ 5 significant terms were detected...
# +++ Done!
lzq_GSEA.integrated
lzq_GSEA.integrated
函数可一键式运行所有的基因集数据库:
# Integrative enrichment analysis of the up-regulated gene list
fit2 <- lzq_GSEA.integrated(
genes = grlist,
gene.type = 'SYMBOL',
GO.ont = 'ALL',
perform.WikiPathways = T,
perform.Reactome = T,
perform.MsigDB = T,
MsigDB.category = 'ALL',
perform.Cancer.Gene.Network = T,
perform.disease.ontoloty = T,
perform.DisGeNET = T,
perform.CellMarker = T,
perform.CMAP = T,
min.Geneset.Size = 3
)
# +++ Updating gene symbols...
# Maps last updated on: Thu Oct 24 12:31:05 2019
# +++ Transforming SYMBOL to ENTREZID...
# 'select()' returned 1:many mapping between keys and columns
# +++ Performing GO-ALL enrichment...
# +++ Symplifying GO results...
# +++ Performing KEGG enrichment...
# +++ Performing Module KEGG enrichment...
# +++ Performing WikiPathways enrichment...
# +++ Performing Reactome pathways enrichment...
# +++ Performing Disease Ontoloty enrichment...
# +++ Performing Cancer Gene Network enrichment...
# no term enriched under specific pvalueCutoff...
# +++ Performing DisGeNET enrichment...
# +++ Performing CellMarker enrichment...
# +++ Performing MsigDB-ALL enrichment...
# +++ Performing CMAP enrichment...
# no term enriched under specific pvalueCutoff...
# +++ 311 significant terms were detected...
# +++ Done!
fit2
对象也是一个list结构,每一个元素对应了一个基因集数据库的结果。
Visualize analyzing result of GSEA
lzq_gseaplot(
fit2$simplyGO,
Pathway.ID = 'GO:0030016',
rank = F,
statistic.position = c(0.71,0.85),
rel.heights = c(1, 0.4)
)
Enrichment barplot for positive or negative GSEA results
lzq_GSEA.barplot1(enrich.obj = fit2$simplyGO,type = 'pos')
Enrichment dotplot for positive or negative GSEA results
lzq_GSEA.dotplot1(enrich.obj = fit2$simplyGO,type = 'pos')
同样的,lzq_GSEA.barplot2
支持同时可视化激活和抑制的GSEA通路:
lzq_GSEA.barplot2(enrich.obj = fit2$simplyGO)
支持中文模式:
lzq_GSEA.barplot2(enrich.obj = fit2$simplyGO,use.Chinese = T)
总体来说 ,BioEnricher
R包内置了很多基因集数据库,一键式批量分析非常方便;另外,可视化的颜值也很高,都是发表级的,可见R包作者也是非常细心。更多内容见BioEnricher_0.1.0.pdf
官方手册,记得去官方github(https://github.com/Zaoqu-Liu/BioEnricher)给作者点点小星星,支持一波~
另外,关于富集分析,特别是单细胞富集分析,还有一些小细节和小技巧,我有时间整理一下,尽情期待!
- END -
https://mp.weixin.qq.com/s/7ymMpAd2Nnxv23IGIqhUVQ