Closed ixxmu closed 2 years ago
前几期推文介绍了Seurat官网的CCA和RPCA方法进行单细胞多样本整合,详见
实际上,除了这两种外,整合去批次的方法还有很多。例如2020年的Genome Biol《A benchmark of batch-effect correction methods for single-cell RNA sequencing data》测评了14种方法,最后的结论是:
Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.
2021年Nat Biotechnol也有一篇测评文章《A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples》对目前主流的单细胞整合方法做了测评和总结,大致结论如下:
可以看到Harmony依旧是能打。Harmony发表在2019年的nature methods上《Fast, sensitive and accurate integration of single-cell data with Harmony》,相较于别的整合方法,其优点在于:
下图是同一个数据集四种整合方法所占的内存情况:
本文主要介绍Harmony和LIGER的代码实战。最后,使用LISI指数评价CCA,RPCA,Harmony和LIGER的整合效果,可以看到Harmony和LEGER的整合效果更佳。LISI结果如下:
R包需提前安装好:
rm(list=ls())
#devtools::install_github('MacoskoLab/liger')
#devtools::install_github('immunogenomics/harmony')
library(Seurat)
library(dplyr)
library(patchwork)
library(readr)
library(ggplot2)
library(RColorBrewer)
library(future)
library(clustree)
library(cowplot)
library(stringr)
library(SeuratDisk)
library(SeuratWrappers)
library(harmony)
library(rliger)
library(reshape2)
# change the current plan to access parallelization
plan("multisession", workers = 2)
plan()
#设置可用的内存
options(future.globals.maxSize = 10 * 1024^3)
# install dataset
InstallData("ifnb")
#### 1.load dataset
ifnb.data = LoadData("ifnb")
#### 2.normalize/HVG/scale三步走
ifnb.data <- ifnb.data %>% NormalizeData(verbose = F) %>%
FindVariableFeatures(selection.method = "vst", nfeatures = 2000, verbose = F) %>%
ScaleData(verbose = F) %>%
RunPCA(npcs = 30, verbose = F)
#### 3. harmony
ifnb.data <- ifnb.data %>% RunHarmony("orig.ident", plot_convergence = T)
#Check the generated embeddings:
harmony_embeddings <- Embeddings(ifnb.data, 'harmony')
harmony_embeddings[1:5, 1:5]
#### 4.降维聚类:进行 UMAP 和 clustering:
n.pcs = 20
ifnb.data <- ifnb.data %>%
RunUMAP(reduction = "harmony", dims = 1:n.pcs, verbose = F) %>%
RunTSNE(reduction = "harmony", dims = 1:n.pcs, verbose = F) %>%
FindNeighbors(reduction = "harmony", k.param = 10, dims = 1:n.pcs)
ifnb.data <- FindClusters(ifnb.data,resolution = 0.5, algorithm = 1)%>%
identity()
#### 5.umap可视化
p1 <- DimPlot(ifnb.data, reduction = "umap", group.by = "stim")
p2 <- DimPlot(ifnb.data, reduction = "umap",group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
P.total = p1 + p2
ggsave(P.total,filename = "Output/integrated_snn_res.harmony.pdf",
width = 15, height = 6)
saveRDS(ifnb.data,file = "Output/integrated.harmony.rds")
Harmony整合效果还是不错的:
LIGER能够跨个体、物种以识别共有的细胞类型,以及数据集特有的特征,提供对不同单细胞数据集的统一分析,LIGER还用于单细胞转录组和scATAC-seq等多拟态数据的整合。相较Harmony,LIGER的运行速度慢一些:
rm(list=ls())
##### 1.load dataset
ifnb.data = LoadData("ifnb")
#### 2.normalize/HVG/scale三步走
ifnb.data <- ifnb.data %>% NormalizeData(verbose = F) %>%
FindVariableFeatures(selection.method = "vst", nfeatures = 2000, verbose = F) %>%
ScaleData(verbose = F, do.center = F) %>%
RunPCA(npcs = 30, verbose = F)
#### 3.使用 orig.ident作为batch运行rliger
#k是矩阵分解的因子数,一般取值20-40
ifnb.data <- RunOptimizeALS(ifnb.data, k = 30, lambda = 5, split.by = "orig.ident")
ifnb.data <- RunQuantileNorm(ifnb.data, split.by = "orig.ident")
#### 4.降维聚类:进行 UMAP 和 clustering:
n.pcs = ncol(ifnb.data[["iNMF"]])
ifnb.data <- ifnb.data %>%
RunUMAP(reduction = "iNMF", dims = 1:n.pcs, verbose = F) %>%
RunTSNE(reduction = "iNMF", dims = 1:n.pcs, verbose = F) %>%
FindNeighbors(reduction = "iNMF", k.param = 10, dims = 1:30)
ifnb.data <- FindClusters(ifnb.data,resolution = 0.5, algorithm = 1)
#### 5.umap可视化
p1 <- DimPlot(ifnb.data, reduction = "umap", group.by = "stim")
p2 <- DimPlot(ifnb.data, reduction = "umap",group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
P.total = p1 + p2
ggsave(P.total,filename = "Output/integrated_snn_res.LIGER.pdf",
width = 15, height = 6)
saveRDS(ifnb.data,file = "Output/integrated.LIGER.rds")
居然有一簇Ctrl特有的CD14 Mono,是真实存在的差异,还是整合未充分呢?
library(lisi)
library(ggthemes)
library(readr)
library(ggplot2)
library(ggpubr)
ifnb.data.liger = read_rds("./Output/integrated.LIGER.rds")
ifnb.data.harmony = read_rds("./Output/integrated.harmony.rds")
ifnb.data.RPCA = read_rds("./Output/integrated.RPCA.rds")
ifnb.data.CCA = read_rds("./Output/integrated.CCA.rds")
# pca
pca.score = lisi::compute_lisi(X = Embeddings(ifnb.data.harmony,reduction = "pca"),
meta_data = ifnb.data.harmony@meta.data,
label_colnames = "stim") %>% dplyr::mutate(type = "pca")
pca.score$type = "raw"
# harmony
harmony.score = lisi::compute_lisi(X = Embeddings(ifnb.data.harmony,reduction = "harmony"),
meta_data = ifnb.data.harmony@meta.data,
label_colnames = "stim") %>% dplyr::mutate(type = "harmony")
#CCA
CCA.score = lisi::compute_lisi(X = Embeddings(ifnb.data.CCA,reduction = "pca"),
meta_data = ifnb.data.CCA@meta.data,
label_colnames = "stim") %>% dplyr::mutate(type = "pca")
CCA.score$type = "CCA"
#RPCA
RPCA.score = lisi::compute_lisi(X = Embeddings(ifnb.data.RPCA,reduction = "pca"),
meta_data = ifnb.data.RPCA@meta.data,
label_colnames = "stim") %>% dplyr::mutate(type = "pca")
RPCA.score$type = "RPCA"
# LIGER
LIGER.score = lisi::compute_lisi(X = Embeddings(ifnb.data.liger,reduction = "iNMF"),
meta_data = ifnb.data.liger@meta.data,
label_colnames = "stim") %>% dplyr::mutate(type = "iNMF")
LIGER.score$type = "LIGER"
#### 可视化
lisi.res <- rbind(pca.score,CCA.score,
RPCA.score,harmony.score,
LIGER.score) %>% tidyr::gather(key, val,"stim")
ggboxplot(lisi.res, x = "type", y = "val",
color = "type", palette = "jco",
# add = "jitter",
short.panel.labs = FALSE)+
labs(y=("LISI score"),x=NULL,title = "Integration method")
可以看到Harmony和LEGER的整合效果更佳。
- END -
https://mp.weixin.qq.com/s/Mkmx088NGONljeyO7b2I3g