单细胞多样本整合之Harmony，LIGER和LISI

单细胞多样本整合之Harmony，LIGER和LISI by 生信随笔

前几期推文介绍了Seurat官网的CCA和RPCA方法进行单细胞多样本整合，详见

实际上，除了这两种外，整合去批次的方法还有很多。例如2020年的Genome Biol《A benchmark of batch-effect correction methods for single-cell RNA sequencing data》测评了14种方法，最后的结论是：

Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.

2021年Nat Biotechnol也有一篇测评文章《A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples》对目前主流的单细胞整合方法做了测评和总结，大致结论如下：

可以看到Harmony依旧是能打。Harmony发表在2019年的nature methods上《Fast, sensitive and accurate integration of single-cell data with Harmony》，相较于别的整合方法，其优点在于：

运行速度比较快；
节省内存；
不同于RPCA，Harmony可用于跨平台、跨组织的多样本整合。

下图是同一个数据集四种整合方法所占的内存情况：

本文主要介绍Harmony和LIGER的代码实战。最后，使用LISI指数评价CCA，RPCA，Harmony和LIGER的整合效果，可以看到Harmony和LEGER的整合效果更佳。LISI结果如下：

一. Harmony实战

R包需提前安装好：

rm(list=ls())
#devtools::install_github('MacoskoLab/liger')
#devtools::install_github('immunogenomics/harmony')
library(Seurat)
library(dplyr)
library(patchwork)
library(readr)
library(ggplot2)
library(RColorBrewer)
library(future)
library(clustree)
library(cowplot)
library(stringr)
library(SeuratDisk)
library(SeuratWrappers)
library(harmony)
library(rliger)
library(reshape2)

# change the current plan to access parallelization
plan("multisession", workers = 2)
plan()
#设置可用的内存
options(future.globals.maxSize = 10 * 1024^3)
# install dataset
InstallData("ifnb")

#### 1.load dataset
ifnb.data = LoadData("ifnb")

#### 2.normalize/HVG/scale三步走
ifnb.data <- ifnb.data %>% NormalizeData(verbose = F) %>%
  FindVariableFeatures(selection.method = "vst", nfeatures = 2000, verbose = F) %>% 
  ScaleData(verbose = F) %>%
  RunPCA(npcs = 30, verbose = F)

#### 3. harmony
ifnb.data <- ifnb.data %>% RunHarmony("orig.ident", plot_convergence = T)
#Check the generated embeddings:
harmony_embeddings <- Embeddings(ifnb.data, 'harmony')
harmony_embeddings[1:5, 1:5]

#### 4.降维聚类：进行 UMAP 和 clustering:
n.pcs = 20
ifnb.data <- ifnb.data %>% 
  RunUMAP(reduction = "harmony", dims = 1:n.pcs, verbose = F) %>% 
  RunTSNE(reduction = "harmony", dims = 1:n.pcs, verbose = F) %>% 
  FindNeighbors(reduction = "harmony", k.param = 10, dims = 1:n.pcs)

ifnb.data <- FindClusters(ifnb.data,resolution = 0.5, algorithm = 1)%>% 
  identity()

#### 5.umap可视化
p1 <- DimPlot(ifnb.data, reduction = "umap", group.by = "stim")
p2 <- DimPlot(ifnb.data, reduction = "umap",group.by = "seurat_annotations", label = TRUE,
              repel = TRUE)
P.total = p1 + p2
ggsave(P.total,filename = "Output/integrated_snn_res.harmony.pdf",
       width = 15, height = 6)
saveRDS(ifnb.data,file = "Output/integrated.harmony.rds")

Harmony整合效果还是不错的：

二. LIGER实战

LIGER能够跨个体、物种以识别共有的细胞类型，以及数据集特有的特征，提供对不同单细胞数据集的统一分析，LIGER还用于单细胞转录组和scATAC-seq等多拟态数据的整合。相较Harmony，LIGER的运行速度慢一些：

rm(list=ls())
##### 1.load dataset
ifnb.data = LoadData("ifnb")

#### 2.normalize/HVG/scale三步走
ifnb.data <- ifnb.data %>% NormalizeData(verbose = F) %>%
  FindVariableFeatures(selection.method = "vst", nfeatures = 2000, verbose = F) %>% 
  ScaleData(verbose = F, do.center = F) %>%
  RunPCA(npcs = 30, verbose = F)

#### 3.使用 orig.ident作为batch运行rliger
#k是矩阵分解的因子数，一般取值20-40
ifnb.data <- RunOptimizeALS(ifnb.data, k = 30, lambda = 5, split.by = "orig.ident") 
ifnb.data <- RunQuantileNorm(ifnb.data, split.by = "orig.ident")

#### 4.降维聚类：进行 UMAP 和 clustering:
n.pcs = ncol(ifnb.data[["iNMF"]])
ifnb.data <- ifnb.data %>% 
  RunUMAP(reduction = "iNMF", dims = 1:n.pcs, verbose = F) %>% 
  RunTSNE(reduction = "iNMF", dims = 1:n.pcs, verbose = F) %>% 
  FindNeighbors(reduction = "iNMF", k.param = 10, dims = 1:30)

ifnb.data <- FindClusters(ifnb.data,resolution = 0.5, algorithm = 1)

#### 5.umap可视化
p1 <- DimPlot(ifnb.data, reduction = "umap", group.by = "stim")
p2 <- DimPlot(ifnb.data, reduction = "umap",group.by = "seurat_annotations", label = TRUE,
              repel = TRUE)
P.total = p1 + p2
ggsave(P.total,filename = "Output/integrated_snn_res.LIGER.pdf",
       width = 15, height = 6)
saveRDS(ifnb.data,file = "Output/integrated.LIGER.rds")

居然有一簇Ctrl特有的CD14 Mono，是真实存在的差异，还是整合未充分呢？

三. LISI进行评估

library(lisi)
library(ggthemes)
library(readr)
library(ggplot2)
library(ggpubr)
ifnb.data.liger = read_rds("./Output/integrated.LIGER.rds")
ifnb.data.harmony = read_rds("./Output/integrated.harmony.rds")
ifnb.data.RPCA = read_rds("./Output/integrated.RPCA.rds")
ifnb.data.CCA = read_rds("./Output/integrated.CCA.rds")

# pca 
pca.score =  lisi::compute_lisi(X = Embeddings(ifnb.data.harmony,reduction = "pca"),
                                meta_data = ifnb.data.harmony@meta.data, 
                                label_colnames =  "stim") %>% dplyr::mutate(type = "pca")
pca.score$type = "raw"

# harmony
harmony.score = lisi::compute_lisi(X = Embeddings(ifnb.data.harmony,reduction = "harmony"),
                     meta_data = ifnb.data.harmony@meta.data, 
                     label_colnames =  "stim") %>% dplyr::mutate(type = "harmony")

#CCA
CCA.score = lisi::compute_lisi(X = Embeddings(ifnb.data.CCA,reduction = "pca"),
                   meta_data = ifnb.data.CCA@meta.data, 
                   label_colnames =  "stim") %>% dplyr::mutate(type = "pca")
CCA.score$type = "CCA"

#RPCA
RPCA.score = lisi::compute_lisi(X = Embeddings(ifnb.data.RPCA,reduction = "pca"),
                   meta_data = ifnb.data.RPCA@meta.data, 
                   label_colnames =  "stim") %>% dplyr::mutate(type = "pca")
RPCA.score$type = "RPCA"

# LIGER
LIGER.score = lisi::compute_lisi(X = Embeddings(ifnb.data.liger,reduction = "iNMF"),
                   meta_data = ifnb.data.liger@meta.data, 
                   label_colnames =  "stim") %>% dplyr::mutate(type = "iNMF")
LIGER.score$type = "LIGER"

#### 可视化
lisi.res <- rbind(pca.score,CCA.score,
                  RPCA.score,harmony.score,
                  LIGER.score) %>% tidyr::gather(key, val,"stim")

ggboxplot(lisi.res, x = "type", y = "val",
          color = "type", palette = "jco",
          # add = "jitter", 
          short.panel.labs = FALSE)+
labs(y=("LISI score"),x=NULL,title = "Integration method")

可以看到Harmony和LEGER的整合效果更佳。

- END -

ixxmu / mp_duty