CellTypist进行免疫细胞注释

CellTypist进行免疫细胞注释 by 单细胞天地

CellTypist是一个人类免疫细胞注释为主的自动注释方法，通过python和command line运行，当然了强行在R中运行也是可以的，具体参考曾老师的推文：使用 CellTypist 进行免疫细胞类型分类-腾讯云开发者社区-腾讯云 (tencent.com)

这里还是在jupyter notebook上进行python代码运行，并且比较与其他自动注释的结果：

首先因为目前大多数单细胞数据还是通过seurat对象在R中运行，而python中我们需要scanpy对象，这里我们进行输入数据的转换以便适配python代码：

#####转换成h5d
library(SeuratDisk)
load('./scRNA_after_scMayoMap.Rdata')
SaveH5Seurat(sce,filename = 'scRNA_after_scMayoMap.h5Seurat')
Convert('scRNA_after_scMayoMap.h5Seurat',dest = 'h5ad')

这样我们便得到了scRNA_after_scMayoMap.h5ad这个scanpy对象

接下来在jupyter notebook上运行如下代码：

import os, sys
import scanpy as sc
import celltypist
from celltypist import models
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

PublicData = './scRNA_after_scMayoMap.h5ad'
Prefix = 'nafld_'
OutDir = './NAFLD_celltypist_py_res'

# import model
model = models.Model.load(model = 'Immune_All_Low.pkl')
# load data
adata = sc.read_h5ad(PublicData)
# annotate
predictions = celltypist.annotate(adata, model = model, majority_voting = True)

# using built-in function to save results
os.mkdir(OutDir)
predictions.to_table(folder = OutDir, prefix = Prefix, xlsx = True)
predictions.to_plots(folder = OutDir, format = 'pdf', prefix = Prefix)

# convert into adata
adata = predictions.to_adata()

# re-plot using scanpy
fig = plt.figure(figsize=(15,12))
gs = gridspec.GridSpec(2, 2, figure=fig)
for i in range(2):
    for j in range(2):
        fig.add_subplot(gs[i,j])

axes = fig.axes
for i in range(4):
    ax = axes[i]
    group = ['seurat_clusters','predicted_labels','over_clustering','majority_voting'][i]
    sc.pl.umap(adata, color=group, frameon=False, show=False, ax=ax, title=group, size=20)

plt.subplots_adjust(bottom=0.1, wspace=0.5, hspace=0.2)
plt.savefig(os.path.join(OutDir, Prefix + 'UMAP_by_celltype.png'), dpi=300, bbox_inches='tight')

fig, ax = plt.subplots(figsize=(12,8))
celltypist.dotplot(predictions, use_as_reference='predicted_labels', use_as_prediction='majority_voting', show=False, ax=ax)
fig.savefig(os.path.join(OutDir, Prefix + 'dotplot_between_pred_major.png'), dpi=300, bbox_inches='tight')

###########################
custom_model = celltypist.train(X=adata, labels='predicted_labels',
    use_SGD=True, n_jobs=1,
    feature_selection=True,
    details='pbmc1k',
    date='2023-10-26')

# write into default path
custom_model.write(f'{models.models_path}/CustomModel.pkl')

最终我们得到了nafld_annotation_result.xlsx这个文件，如下：

我们将这个结果重新赋值回sce这个seurat对象，在R中运行：

####从py中回来，读取result结果
# 安装包
#install.packages("openxlsx")
library(openxlsx)
# 文件名+sheet的序号
data<- read.xlsx("./NAFLD_celltypist_py_res/nafld_annotation_result.xlsx", sheet = 1)
#View(data)
#赋值回sce
sce@meta.data$cell<-rownames(sce@meta.data)
colnames(data)[1]<-'cell'
sce@meta.data<-merge(sce@meta.data,data,by = 'cell')
library(Seurat)
library (gplots) 
balloonplot (table (sce$majority_voting,sce$customclassif))

但是我们发现结果还是有点多，足足细分成了23个细胞类型，在细胞探索初期，我们只需要分成大类即可，因此，我们手动重新分组：

rownames(sce@meta.data)<-sce$cell
length(unique(sce$majority_voting))
###
sce@meta.data$celltypist = "NA"
Idents(sce)<-'majority_voting'
levels(Idents(sce)) #查看细胞亚群
new.cluster.ids <- c("Kupffer cells", "Macrophages", "T", "T", "NK", "T","T",
                     "NK", "monocytes", "T","Macrophages","monocytes","B","T",
                     "B","Mast cells","B","Endothelial","DC","Epithelial","DC",
                     "DC","B")
names(new.cluster.ids) <- levels(sce)
sce <- RenameIdents(sce, new.cluster.ids)
levels(sce) 
#[1] "Kupffer cells" "Macrophages"   "T"             "NK"            "monocytes"    
#[6] "B"             "Mast cells"    "Endothelial"   "DC"            "Epithelial"

oK，现在我们得到了上述几个大类细胞，并且我们比较与scType的结果：

balloonplot (table (Idents(sce),sce$customclassif))

（横坐标是CellTypist，纵坐标是scType）

结果发现大类上还是挺靠谱的。

但是继续与scMayoMap对比，

（横坐标是CellTypist，纵坐标是scMayoMap）

发现由于参考数据集细胞种类的不同，scMayoMap鉴定出来的Kupffer细胞中包含了很多的T、NK细胞，这显然更加说明参考数据集的重要性，不过总得来说，CellTypist对免疫细胞进行注释，还是比较对症的，比较推荐，而且注释的速度也不算很慢，可以接受，远远不像scHCL那么抽象。

ps：后续会更新python跑单细胞转录组全程代码，尤其是高级分析更加需要python。

往期回顾

EcoTyper代码实操（二）：从scRNA-seq数据恢复细胞状态和生态型

scMayoMap-说明文档版

心肌梗死后心脏成纤维细胞中胶原三螺旋重复序列(CTHRC1)的重要作用

端到端的单细胞管道SCP-整合流程

单细胞测序最好的教程（十）：万能的Transformer与细胞注释

如果你对单细胞转录组研究感兴趣，但又不知道如何入门，也许你可以关注一下下面的课程

看完记得顺手点个“在看”哦！

生物 | 单细胞 | 转录组丨资料

每天都精彩

长按扫码可关注

ixxmu / mp_duty

CellTypist进行免疫细胞注释 #4242

CellTypist进行免疫细胞注释 by 单细胞天地