Closed ixxmu closed 1 year ago
在学习/测试新方法时,起步常需要一些测试数据。单细胞测序中比较经典的是10X Genomics公司在其官网开源的PBMC数据。我们在前文 单细胞测序 | 自动化细胞注释工具-ScType 中就是从官网下载的 pbmc3k 测试数据:
https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/pbmc3k
在R语言中,TENxPBMCData包提供了更为便捷的获取方式
https://bioconductor.org/packages/release/data/experiment/html/TENxPBMCData.html
The TENxPBMCData package provides a R / Bioconductor resource for representing and manipulating nine different single-cell RNA-seq (scRNA-seq) and CITE-seq data sets on peripheral blood mononuclear cells (PBMC) generated by 10X Genomics.
安装:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("TENxPBMCData")
数据获取:
library(TENxPBMCData)
pbmc3k = TENxPBMCData(dataset = "pbmc3k")
pbmc3k
class: SingleCellExperiment
dim: 32738 2700
metadata(0):
assays(1): counts
rownames(32738): ENSG00000243485 ENSG00000237613 ...
该包完成的工作是从 ExperimentHub 下载数据并转换成 SingleCellExperiment 对象。目前可得的数据是10套(对应上示函数中的 dataset 参数),使用 args(TENxPBMCData)
可见:
pbmc68k、frozen_pbmc_donor_a、frozen_pbmc_donor_b、frozen_pbmc_donor_c、pbmc33k、pbmc3k、pbmc6k、pbmc4k、pbmc8k、pbmc5k-CITEseq
针对上面得到的 pbmc3k,如果想接 Seurat 标准流程,只需要做下转换:
# 准备
pbmc.data = as.data.frame(counts(pbmc3k))
colnames(pbmc.data) <- paste0("Cell", seq_len(ncol(pbmc3k)))
rownames(pbmc.data) = scater::uniquifyFeatureNames(rowData(pbmc3k)$ENSEMBL_ID, rowData(pbmc3k)$Symbol_TENx)
# 创建Seurat对象
pbmc = Seurat::CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc
An object of class Seurat 13714 features across 2700 samples within 1 assay Active assay: RNA (13714 features, 0 variable features)
绘图系列·往期精彩
https://mp.weixin.qq.com/s/a20PziCTQE6QDuHBnme3pw