Closed ixxmu closed 11 months ago
CellTypist是一个人类免疫细胞注释为主的自动注释方法,通过python和command line运行,当然了强行在R中运行也是可以的,具体参考曾老师的推文:使用 CellTypist 进行免疫细胞类型分类-腾讯云开发者社区-腾讯云 (tencent.com)
这里还是在jupyter notebook上进行python代码运行,并且比较与其他自动注释的结果:
首先因为目前大多数单细胞数据还是通过seurat对象在R中运行,而python中我们需要scanpy对象,这里我们进行输入数据的转换以便适配python代码:
#####转换成h5d
library(SeuratDisk)
load('./scRNA_after_scMayoMap.Rdata')
SaveH5Seurat(sce,filename = 'scRNA_after_scMayoMap.h5Seurat')
Convert('scRNA_after_scMayoMap.h5Seurat',dest = 'h5ad')
这样我们便得到了scRNA_after_scMayoMap.h5ad这个scanpy对象
接下来在jupyter notebook上运行如下代码:
import os, sys
import scanpy as sc
import celltypist
from celltypist import models
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
PublicData = './scRNA_after_scMayoMap.h5ad'
Prefix = 'nafld_'
OutDir = './NAFLD_celltypist_py_res'
# import model
model = models.Model.load(model = 'Immune_All_Low.pkl')
# load data
adata = sc.read_h5ad(PublicData)
# annotate
predictions = celltypist.annotate(adata, model = model, majority_voting = True)
# using built-in function to save results
os.mkdir(OutDir)
predictions.to_table(folder = OutDir, prefix = Prefix, xlsx = True)
predictions.to_plots(folder = OutDir, format = 'pdf', prefix = Prefix)
# convert into adata
adata = predictions.to_adata()
# re-plot using scanpy
fig = plt.figure(figsize=(15,12))
gs = gridspec.GridSpec(2, 2, figure=fig)
for i in range(2):
for j in range(2):
fig.add_subplot(gs[i,j])
axes = fig.axes
for i in range(4):
ax = axes[i]
group = ['seurat_clusters','predicted_labels','over_clustering','majority_voting'][i]
sc.pl.umap(adata, color=group, frameon=False, show=False, ax=ax, title=group, size=20)
plt.subplots_adjust(bottom=0.1, wspace=0.5, hspace=0.2)
plt.savefig(os.path.join(OutDir, Prefix + 'UMAP_by_celltype.png'), dpi=300, bbox_inches='tight')
fig, ax = plt.subplots(figsize=(12,8))
celltypist.dotplot(predictions, use_as_reference='predicted_labels', use_as_prediction='majority_voting', show=False, ax=ax)
fig.savefig(os.path.join(OutDir, Prefix + 'dotplot_between_pred_major.png'), dpi=300, bbox_inches='tight')
###########################
custom_model = celltypist.train(X=adata, labels='predicted_labels',
use_SGD=True, n_jobs=1,
feature_selection=True,
details='pbmc1k',
date='2023-10-26')
# write into default path
custom_model.write(f'{models.models_path}/CustomModel.pkl')
最终我们得到了nafld_annotation_result.xlsx这个文件,如下:
我们将这个结果重新赋值回sce这个seurat对象,在R中运行:
####从py中回来,读取result结果
# 安装包
#install.packages("openxlsx")
library(openxlsx)
# 文件名+sheet的序号
data<- read.xlsx("./NAFLD_celltypist_py_res/nafld_annotation_result.xlsx", sheet = 1)
#View(data)
#赋值回sce
sce@meta.data$cell<-rownames(sce@meta.data)
colnames(data)[1]<-'cell'
sce@meta.data<-merge(sce@meta.data,data,by = 'cell')
library(Seurat)
library (gplots)
balloonplot (table (sce$majority_voting,sce$customclassif))
但是我们发现结果还是有点多,足足细分成了23个细胞类型,在细胞探索初期,我们只需要分成大类即可,因此,我们手动重新分组:
rownames(sce@meta.data)<-sce$cell
length(unique(sce$majority_voting))
###
sce@meta.data$celltypist = "NA"
Idents(sce)<-'majority_voting'
levels(Idents(sce)) #查看细胞亚群
new.cluster.ids <- c("Kupffer cells", "Macrophages", "T", "T", "NK", "T","T",
"NK", "monocytes", "T","Macrophages","monocytes","B","T",
"B","Mast cells","B","Endothelial","DC","Epithelial","DC",
"DC","B")
names(new.cluster.ids) <- levels(sce)
sce <- RenameIdents(sce, new.cluster.ids)
levels(sce)
#[1] "Kupffer cells" "Macrophages" "T" "NK" "monocytes"
#[6] "B" "Mast cells" "Endothelial" "DC" "Epithelial"
oK,现在我们得到了上述几个大类细胞,并且我们比较与scType的结果:
balloonplot (table (Idents(sce),sce$customclassif))
(横坐标是CellTypist,纵坐标是scType)
结果发现大类上还是挺靠谱的。
但是继续与scMayoMap对比,
(横坐标是CellTypist,纵坐标是scMayoMap)
发现由于参考数据集细胞种类的不同,scMayoMap鉴定出来的Kupffer细胞中包含了很多的T、NK细胞,这显然更加说明参考数据集的重要性,不过总得来说,CellTypist对免疫细胞进行注释,还是比较对症的,比较推荐,而且注释的速度也不算很慢,可以接受,远远不像scHCL那么抽象。
ps:后续会更新python跑单细胞转录组全程代码,尤其是高级分析更加需要python。
EcoTyper代码实操(二):从scRNA-seq数据恢复细胞状态和生态型
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
看完记得顺手点个“在看”哦!
长按扫码可关注
https://mp.weixin.qq.com/s/gUYm7bVbmI1SjeSUWwbCTQ