ixxmu / mp_duty

抓取网络文章到github issues保存
https://archives.duty-machine.now.sh/
108 stars 30 forks source link

空间转录组分析工具篇1:Seurat #2386

Closed ixxmu closed 2 years ago

ixxmu commented 2 years ago

https://mp.weixin.qq.com/s/A7-hfLyYV6fZIB42ovujpg

github-actions[bot] commented 2 years ago

空间转录组分析工具篇1:Seurat by 生信会客厅

      10x visium空间转录组2019年底开始在国内开展商业服务,迄今已有近3年的时间了。据我所知,第一批吃螃蟹的人大多没有收获很多新技术的红利,因为早期可以借鉴的文章寥寥无几,分析工具也乏善可陈。大部分人是把空转数据作为单细胞分析结果的补充与辅助证据,也有少数人尝试把空转数据作为主体开展分析,但是很难把文章发到顶级期刊。去年以来,空转的文章和生信工具逐渐丰富起来,FFPE空转、高分辨率空转、大检测面积空转等新产品已经或即将面世,空转价格也下降了不少。空间转录组数据挖掘的黄金时期已经拉开帷幕,我将陆续写一些空转分析和文献解读的推文,不足之处请大家批评指正,欢迎大家扫文末二维码交流探讨。

Seurat空转分析流程简介
      准备分析空间转录组数据的朋友,大部分应该用seurat分析过单细胞数据,因此用它做一些空转数据的基础分析无疑是学习成本最低的一种方式。Seurat的空转分析流程包含了质控、降维聚类、单细胞+空转整合去卷积、空间变异基因分析和可视化,基本涵盖了空转分析的主要内容。Seurat的不足之处有以下几点:
  1. Louvain聚类算法没有考虑空间位置信息,聚类结果的精确性略显不足;

  2. 基于锚点的单细胞+空转数据整合去卷积分析,经常会忽略一些细胞亚型的存在;

  3. Markvariogram和moransi找到的空间变异基因(SVGs),没有提供工具把它们按空间区域或共表达模式分组,需要人工逐个检查SVGs的特异性表达区域。


数据预处理

数据来源:10x genomics官方的两个样本示例数据,下载链接:
https://www.10xgenomics.com/resources/datasets/mouse-brain-serial-section-1-sagittal-anterior-1-standard-1-1-0
https://www.10xgenomics.com/resources/datasets/mouse-brain-serial-section-1-sagittal-posterior-1-standard-1-1-0
每个样本新建一个文件夹,下载以下数据


创建空转分析对象

library(Seurat)library(tidyverse)library(patchwork)rm(list = ls())setwd("~/project2022/2022weixin/空转分析工具篇1_Seurat")dir.create("QC")dir.create("Data")dir.create("Cluster")dir.create("SVG")dir.create("Deconvolution")
### 读取数据brain <- Load10X_Spatial(data.dir = "InputData/mouse_brain_sagittal_a1/", # 存放cellranger结果的目录 filename = "filtered_feature_bc_matrix.h5", # h5格式的表达矩阵名称 assay = "Spatial", # seurat对象中存放表达矩阵的assay名称 slice = "slice1" # seurat对象中设置的空转样本ID,多样本合并分析时有用                         )

空转数据的seurat对象相比单细胞数据有两点不同,首先是保存初始表达数据的slot默认名称从“RNA”变成了“Spatial”,其次是images slot下有了保存空转图片和spot坐标的数据对象。

数据质控
       组织切片的边缘的SPOTs容易出现数据异常的情况,可以参照单细胞的质控方法过滤线粒体过高或基因数过低的SPOTs。
### 数据质控# 计算SPOT的线粒体基因比例brain[["percent.mt"]] <- PercentageFeatureSet(brain, pattern = "^mt-")# 计算SPOT的核糖体基因比例brain[["percent.rb"]] <- PercentageFeatureSet(brain, pattern = "^Rp[ls]")
## 质控前指标展示# 分子数p1 <- VlnPlot(brain, features = "nCount_Spatial") + NoLegend() + theme(axis.text.x = element_blank())p2 <- SpatialFeaturePlot(brain, features = "nCount_Spatial") + theme(legend.position = "right")p <- p1|p2ggsave("QC/nCount_Spatial_before.pdf", p, width = 10, height = 6)# 基因数p1 <- VlnPlot(brain, features = "nFeature_Spatial") + NoLegend() + theme(axis.text.x = element_blank())p2 <- SpatialFeaturePlot(brain, features = "nFeature_Spatial") + theme(legend.position = "right")p <- p1|p2ggsave("QC/nFeature_Spatial_before.pdf", p, width = 10, height = 6)# 线粒体p1 <- VlnPlot(brain, features = "percent.mt") + NoLegend() + theme(axis.text.x = element_blank())p2 <- SpatialFeaturePlot(brain, features = "percent.mt") + theme(legend.position = "right")p <- p1|p2ggsave("QC/percent_mt_before.pdf", p, width = 10, height = 6)# 核糖体p1 <- VlnPlot(brain, features = "percent.rb") + NoLegend() + theme(axis.text.x = element_blank())p2 <- SpatialFeaturePlot(brain, features = "percent.rb") + theme(legend.position = "right")p <- p1|p2ggsave("QC/percent_rb_before.pdf", p, width = 10, height = 6)
## 质控minCount = 1500minFeature = 500maxMT = 25brain <- subset(brain, nCount_Spatial>minCount&nFeature_Spatial>minFeature&percent.mt<maxMT)
## 质控后指标展示# 分子数p1 <- VlnPlot(brain, features = "nCount_Spatial") + NoLegend() + theme(axis.text.x = element_blank())p2 <- SpatialFeaturePlot(brain, features = "nCount_Spatial") + theme(legend.position = "right")p <- p1|p2ggsave("QC/nCount_Spatial_after.pdf", p, width = 10, height = 6)# 基因数p1 <- VlnPlot(brain, features = "nFeature_Spatial") + NoLegend() + theme(axis.text.x = element_blank())p2 <- SpatialFeaturePlot(brain, features = "nFeature_Spatial") + theme(legend.position = "right")p <- p1|p2ggsave("QC/nFeature_Spatial_after.pdf", p, width = 10, height = 6)# 线粒体p1 <- VlnPlot(brain, features = "percent.mt") + NoLegend() + theme(axis.text.x = element_blank())p2 <- SpatialFeaturePlot(brain, features = "percent.mt") + theme(legend.position = "right")p <- p1|p2ggsave("QC/percent_mt_after.pdf", p, width = 10, height = 6)# 核糖体p1 <- VlnPlot(brain, features = "percent.rb") + NoLegend() + theme(axis.text.x = element_blank())p2 <- SpatialFeaturePlot(brain, features = "percent.rb") + theme(legend.position = "right")p <- p1|p2ggsave("QC/percent_rb_after.pdf", p, width = 10, height = 6) 

空转数据的基因中位数明显高于单细胞数据


降维聚类
       空间转录组的基本单位是SPOT,一个SPOT通常包含了5-10个细胞。SPOTs聚类的结果无法与细胞类型对应起来,它通常与组织的不同结构区域或病理区域匹配度更高。Seurat对空间转录组数据的聚类依然沿用单细胞分析流程的louvain算法,有多篇评测文章证明此方法的聚类结果与病理学家的人工注释结果有些出入。
brain <- SCTransform(brain, assay = "Spatial") # 注意assay要指定为Spatialbrain <- RunPCA(brain, assay = "SCT", verbose = FALSE)brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30)brain <- FindClusters(brain, verbose = FALSE)brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
p1 <- DimPlot(brain, reduction = "umap", label = TRUE)p2 <- SpatialDimPlot(brain, label = TRUE, label.size = 3)p <- p1 + p2ggsave("Cluster/cluster_louvain.pdf", p, width = 15, height = 7)
p <- SpatialDimPlot(brain, facet.highlight = TRUE, ncol = 5, cells.highlight = CellsByIdentities(brain))ggsave("Cluster/cluster_louvain_facet.pdf", p, width = 15, height = 9)

louvain算法聚类结果

分面图展示cluster对应的区域

基因原位表达
       空间转录组的一个重要用途是查看基因的原位表达情况,下面我们用几个经典的细胞类型marker基因演示如何应用。astrocytes (Gja1+), oligodendrocyte (Aspa+), newlyformed oligodendrocytes (Bmp4+), oligodendrocyte precursors(Pdgfra+), microglia (C1qa+), endothelial cells (Flt1+) , excitatory neurons(Slc17a7+) and inhibitory neurons(Gad2+).
markers <- c('Gja1', 'Flt1', 'Slc17a7', 'Gad2', 'C1qa', 'Bmp4', 'Aspa', 'Pdgfra')p <- SpatialFeaturePlot(brain, features = markers, ncol = 4)ggsave("Cluster/cluster_celltype_marker.pdf", p, width = 15, height = 9)  # 调整图形中spot尺寸和透明度p1 <- SpatialFeaturePlot(brain, features = 'Slc17a7', pt.size.factor = 1)p2 <- SpatialFeaturePlot(brain, features = 'Slc17a7', alpha = c(0.1, 1))p <- p1 + p2ggsave("Cluster/gene_expression_demo.pdf", p, width = 10, height = 6)

从细胞类型marker基因可以看出,兴奋神经元与抑制神经元各自存在不同的区域,星形胶质细胞、内皮细胞、小胶质细胞和OPC则在整个组织内基本均匀分布。

识别空间变异基因

       Seurat提供了两种思路的空间变异基因识别方法,一种是用注释好的组织区域(手工注释或聚类)做差异分析;另一种方法是基于基因表达的空间自相关性分析,这种方法不需要提供组织注释或聚类信息。

差异分析法

## 差异分析法de_markers <- FindMarkers(brain, ident.1 = 3, ident.2 = 0)de_markers <- de_markers[order(de_markers$avg_log2FC),]top_markers <- c(rownames(head(de_markers,3)), rownames(tail(de_markers,3)))p <- SpatialFeaturePlot(object = brain, features =top_markers, alpha = c(0.1, 1), ncol = 3)ggsave("SVG/cluster_top_marker.pdf", p, width = 12, height = 8)

空间自相关性算法
       这类分析方法有很多算法来实现,如:SpatialDE, SPARK, Trendsceek, Markvariogram, Moran's I等,seurat内置了后两种算法。SpatialDE和SPARK的实现方法我将在后面的推文中介绍, Trendsceek因计算量太大而无法适用10x visium空转数据的分析。
## 标记变异函数法(运行时间较长,请耐心等待)brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT",                                        features = VariableFeatures(brain),                                       selection.method = "markvariogram")
## 莫兰指数法brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT", features = VariableFeatures(brain), selection.method = "moransi")  ## SVGs展示top.features <- head(SpatiallyVariableFeatures(brain, selection.method = "markvariogram"), 6)p <- SpatialFeaturePlot(brain, features = top.features, ncol = 3, alpha = c(0.1, 1))ggsave("SVG/markvariogram_top_marker.pdf", p, width = 12, height = 8)
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = "moransi"), 6)p <- SpatialFeaturePlot(brain, features = top.features, ncol = 3, alpha = c(0.1, 1))ggsave("SVG/moransi_top_marker.pdf", p, width = 12, height = 8)                        


SPOTs去卷积分析

       10x visium空间转录组芯片的每个SPOT直径为55微米,一般包含多个细胞的转录组数据,无法直接查看每个细胞类型的空间分布。10x visium这类非单细胞分辨率的空间转录组数据,一个重要分析内容是解析每个SPOT的细胞组成和占比情况,并以此非基础分析细胞的空间分布和细胞之间互作。Seurat基于锚点建立空转数据的SPOTs与单细胞数据中Cells之间的联系,并推测SPOTs中各种细胞亚型的权重。这种分析需要与空转数据配套的单细胞转录组数据,最好是一个组织既做单细胞又做空转。本文所用的单细胞数据下载链接:https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds
# 单细胞数据allen_reference <- readRDS("InputData/allen_cortex.rds")allen_reference <- SCTransform(allen_reference, ncells = 3000, verbose = FALSE) %>%  RunPCA(verbose = FALSE) %>% RunUMAP(dims = 1:30)p <- DimPlot(allen_reference, group.by = "subclass", label = TRUE)ggsave("Deconvolution/allen_reference_celltype.pdf", p, width = 8, height = 6)  
 

# 锚点转换anchors <- FindTransferAnchors(reference = allen_reference, query = brain,                                normalization.method = "SCT")predictions.assay <- TransferData(anchorset = anchors,                                   refdata = allen_reference$subclass,                                   prediction.assay = TRUE,                                  weight.reduction = brain[["pca"]],                                   dims = 1:30)brain[["predictions"]] <- predictions.assay
# 结果展示DefaultAssay(brain) <- "predictions"cts <- c("L2/3 IT", "L4", "L5 PT", "L5 IT", "L6 CT", "L6b", "Oligo", "Astro", "Macrophage")p <- SpatialFeaturePlot(brain, features = cts, pt.size.factor = 1.6, ncol = 3, crop = TRUE)ggsave("Deconvolution/celltype_prediction.pdf", p, width = 12, height = 12)

预测的细胞亚型分布


交流合作
请加微信时简要说明自己的身份、研究方向、生信基础。:复旦博后,研究肝癌、3年生经验,分析单细胞半年。没有简介者无法通过好友申请。我们公司(晶能生物)2016年开始单细胞测序服务,2019年开展空间转录组测序业务,服务质量经过了10X Genomics公司的权威认证。欢迎大家联系Kinesin洽谈单细胞、空转测序及数据分析业务!