WPS单细胞技术的PBMC数据测评

WPS单细胞技术的PBMC数据测评 by 单细胞天地

之前看过一个国产单细胞技术大比拼的文章，但主要是从市场角度在剖析，不符合我们技术极客的审美，大概有如下技术：

华大智造-DNBelab C4便携式单细胞系统
新格元-Singleron Matrix单细胞测序文库构建系统
寻因生物-Seekone DD数字液滴仪
万乘基因-10K Genomics液滴微流控单细胞测序仪
达普生物-Galaxy星海单细胞测序建库仪
墨卓生物-MobiNova-100单细胞测序建库系统
百迈客-百创DG1000单细胞微液滴系统
德运康瑞- Well-paired-seq 单细胞组学平台(痕量单细胞)

我们来从数据角度测评一下这些国产单细胞技术吧，首先是Well-paired-seq，简称是WPS蛮有意思的，是苏州德运康瑞生物科技有限公司（以下简称“德运康瑞”）的技术，该公司有多款单细胞测序技术平台（Paired-seq，Digital-seq，和Well-paired-seq）。我们本次主要是测评Well-paired-seq，因为有公开数据集可以下载：

文章是：《Well-Paired-Seq: A Size-Exclusion and Locally Quasi-Static Hydrodynamic Microwell Chip for Single-Cell RNA-Seq》，数据链接是：https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE192708，可以直接下载全部的三百多M的数据挖掘：

Supplementary file Size Download File type/resource
GSE192708_3T3-bulk_raw_dge.txt.gz 124.9 Kb 
GSE192708_PBMC_sample_annotation.txt.gz 93.3 Kb 
GSE192708_Well-paired-seq_3T3_single_cells_raw_dge.txt.gz 124.1 Kb 
GSE192708_Well-paired-seq_PBMC_raw_dge.txt.gz 7.6 Mb 
GSE192708_Well-paired-seq_drug_treated_cells_raw_dge.txt.gz 34.1 Mb 
GSE192708_Well-paired-seq_five_human_lung_adenocarcinoma_cell_lines_raw_dge.txt.gz 11.1 Mb 
GSE192708_Well-paired-seq_species-mixing_sample1_human_raw_dge.txt.gz 2.7 Mb 
GSE192708_Well-paired-seq_species-mixing_sample1_mouse_raw_dge.txt.gz 81.1 Kb 
GSE192708_Well-paired-seq_species-mixing_sample2_human_raw_dge.txt.gz 5.0 Mb 
GSE192708_Well-paired-seq_species-mixing_sample2_mouse_raw_dge.txt.gz 4.4 Mb 
GSE192708_Well-paired-seq_species-mixing_sample3_human_raw_dge.txt.gz 1.5 Mb 
GSE192708_Well-paired-seq_species-mixing_sample3_mouse_raw_dge.txt.gz 1.3 Mb 
GSE192708_Well-paired-seq_species-mixing_sample4_human_raw_dge.txt.gz 205.1 Kb 
GSE192708_Well-paired-seq_species-mixing_sample4_mouse_raw_dge.txt.gz 214.2 Kb 
GSE192708_Well-paired-seq_species-mixing_sample5_human_raw_dge.txt.gz 12.7 Mb 
GSE192708_Well-paired-seq_species-mixing_sample5_mouse_raw_dge.txt.gz 11.0 Mb 
GSE192708_Well-paired-seq_species-mixing_sample6_human_raw_dge.txt.gz 126.4 Mb 
GSE192708_Well-paired-seq_species-mixing_sample6_mouse_raw_dge.txt.gz 110.1 Mb 
GSE192708_Well-paired-seq_species-mixing_sample7_human_raw_dge.txt.gz 8.2 Mb 
GSE192708_Well-paired-seq_species-mixing_sample7_mouse_raw_dge.txt.gz 6.1 Mb 
GSE192708_drug_treatment_sample_annotation.txt.gz 186.3 Kb 
GSE192708_human_lung_cell_lines_sample_annotation.txt.gz 73.0 Kb

首先我们测评PBMC数据

需要用到的是：

GSE192708_Well-paired-seq_PBMC_raw_dge.txt.gz 7.6 Mb
GSE192708_PBMC_sample_annotation.txt.gz 93.3 Kb

之前在《生信技能树》公众号的一个教程：这也能画？，我提到了一个很无聊的R包，名字是：scRNAstat ，它可以4行代码进行单细胞转录组的降维聚类分群，其实完全没有技术含量，就是把 Seurat 流程的一些步骤包装成为了4个函数：

basic_qc (查看数据质量)
basic_filter （进行一定程度的过滤）
basic_workflow （降维聚类分群）
basic_markers（检查各个亚群的标记基因）

我们就使用它来处理这个GSE192708_Well-paired-seq_PBMC_raw_dge.txt.gz

首先读入表达量矩阵并且构建对象：

library(scRNAstat) 
library(Seurat)
library(ggplot2)
library(clustree)
library(cowplot)
library(dplyr)

library(hdf5r)
#导入H5格式的数据
ct <- data.table::fread('GSE192708_Well-paired-seq_PBMC_raw_dge.txt.gz',data.table = F)
ct[1:4,1:4]
rownames(ct)=ct[,1]
ct=ct[,-1]
sce <- CreateSeuratObject(ct,
                          project = "WPS_PBMC",
                          min.cells = 5,min.features = 200) #后面就可以单细胞处理的标准流程啦
sce
table(sce$orig.ident)

居然是两个样品，矩阵里面是2万个左右的单细胞，但是设置 min.features = 200 就只剩下八千多细胞啦，如果设置 min.features = 300 就只有四千多个细胞了。

接下来就是降维聚类分群啦，代码如下所示：

x='WPS_PBMC'
dir.create( x )

sce = basic_qc(sce=sce,org='human',
               dir = x) ;sce
# sce = basic_filter(sce)  ;sce
sce = basic_workflow(sce,dir = x)   ;sce
markers_figures <- basic_markers(sce,
                                 org='human',
                                 group='seurat_clusters',
                                 dir = x)
p2<-DimPlot(sce, reduction = "umap", label = TRUE, repel = TRUE,pt.size = 0.5) + NoLegend()
p2
ggsave("./umap.pdf",width = 5,height = 5)
markers_figures$all_markers +  p2

可以看到主要是T细胞和B细胞，还有髓系：

T细胞和B细胞，还有髓系

其中T细胞应该是区分成为naive和效应的两个完全不同的功能分群。6很明显就是B细胞，而5，7，8 应该是不一样的单核细胞，通常是cd14和cd16两个区别，然后9就是树突细胞啦

因为是PBMC，所以就是纯粹的免疫细胞啦，如果是免疫细胞亚群进行细分，包括淋巴系（T,B,NK细胞）和髓系（单核，树突，巨噬，粒细胞）的两大类作为第二次细分亚群。

跟文章对比

文章里面的这个PBMC的降维聚类分群比较细致，因为它使用的特异性基因比较多：

image-20220822201753983

图例是；

d) UMAP of single-cell profiles (dots) from PBMCs (8027 cells) colored by cell type.
e) Dot plot visualization of each cell type in PBMCs single cells transcriptome data.

总体上来说，每个亚群各自的特异性基因还是可以区分大家的，就是这些特异性基因的表达量并不是很高，算是一个小瑕疵吧。

该GSE192708数据集里面也有其它表达量矩阵，我们后续再继续测评哈。