Closed ixxmu closed 11 months ago
“ 空间转录组测序主要包括5个步骤,我们着重下游分析部分:空转数据分析和可视化。本篇主分享如何使用python和R读取空转数据,主要使用python中的scanpy stlearn 和R中的seurat包”
在正式开始之前,我们先看看cellranger流程跑完之后,空间转录组结果的数据组成,主要是两部分:
1.表达矩阵文件夹
2.空转图片文件夹
1.表达矩阵文件夹
2.表达矩阵文件夹
另外:如果你对单细胞数据读取比较感兴趣,可以看我以前的贴子
python中读取空间转录组的方式,scanpy较常见
R中读取空间转录组的方式,seurat最常见
01
—
1.1使用scanpy读取空转数据
在linux系统中的文件格式如下
import scanpy as sc
#方法一
ns7=sc.read_visium(path="./ns7/",count_file='./2.3.h5_files/filtered_feature_bc_matrix.h5',library_id="NS_7"
,load_images=True,source_image_path="./ns7/spatial/")
#方法二
adata=sc.read_visium(path="./ns56/",count_file='filtered_feature_bc_matrix.h5',library_id="NS_56",
load_images=True,source_image_path="./ns56/spatial/")
#读取之后,通常走一套组合拳,qc 可视化
sc.pp.calculate_qc_metrics(adata, inplace=True)
adata.var['mt'] = [gene.startswith('MT-') for gene in adata.var_names]
adata.obs['mt_frac'] = adata[:, adata.var['mt']].X.sum(1).A.squeeze()/adata.obs['total_counts']
fig, axs = plt.subplots(1,4, figsize=(15,4))
fig.suptitle('Covariates for filtering')
sb.distplot(adata.obs['total_counts'], kde=False, ax = axs[0])
sb.distplot(adata.obs['total_counts'][adata.obs['total_counts']<10000], kde=False, bins=40, ax = axs[1])
sb.distplot(adata.obs['n_genes_by_counts'], kde=False, bins=60, ax = axs[2])
sb.distplot(adata.obs['n_genes_by_counts'][adata.obs['n_genes_by_counts']<4000], kde=False, bins=60, ax = axs[3])
sc.pp.filter_cells(adata, min_counts = 5000)
print(f'Number of cells after min count filter: {adata.n_obs}')
sc.pp.filter_cells(adata, max_counts = 35000)
print(f'Number of cells after max count filter: {adata.n_obs}')
adata = adata[adata.obs['mt_frac'] < 0.2]
print(f'Number of cells after MT filter: {adata.n_obs}')
sc.pp.filter_cells(adata, min_genes = 3000)
print(f'Number of cells after gene filter: {adata.n_obs}')
sc.pp.filter_genes(adata, min_cells=10)
print(f'Number of genes after cell filter: {adata.n_vars}')
#这里最重要!!!!!!!!!
sc.pp.normalize_total(adata, inplace = True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor='seurat', n_top_genes=2000, inplace=True)
sc.pp.pca(adata, n_comps=50, use_highly_variable=True, svd_solver='arpack')
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.louvain(adata, key_added='clusters')
#可视化
sc.pl.umap(adata, color=['total_counts', 'n_genes_by_counts'])
sc.pl.umap(adata, color='clusters', palette=sc.pl.palettes.default_20)
sc.pl.spatial(adata, img_key = "hires",color=['total_counts', 'n_genes_by_counts'])
sc.pl.spatial(adata, img_key = "hires", color="clusters", size=1.5)
#详细内容可以参考https://scanpy-tutorials.readthedocs.io/en/multiomics/analysis-visualization-spatial.html
1.2 stlearn读取空转数据
https://stlearn.readthedocs.io/en/latest/tutorials/Read_any_data.html
需要三个文件:
count matrix
spatial location
image path (optional)
count
空间坐标信息
图片信息为可选
The image path is optional. If you don’t provide any image, the background will be 'black' or 'white'
import stlearn as st
adata = st.create_stlearn(count=count_matrix,spatial=spatial,library_id="Sample_test", scale=1,background_color="white")
质控qc
st.pl.QC_plot(adata)
如果你还想添加metadata信息,可以添加seurat的聚类信息或者celltype
adata.obs["seurat"] = pd.Categorical(seurat_results)
02
—
2.1 如果你有h5文件+ spatial 文件夹
################################NS_7
#首先读取表达矩阵数据
#这里的blank是指使用空的spot,需要去掉。如果你没有空的spot,就不需要blank
NS_7_exp <- Read10X_h5(filename = "/data/biomath/Spatial_Transcriptome/silicosis/Spatial_transcriptome/A72-NS-7d/2.3.h5_files/filtered_feature_bc_matrix.h5" ) #1638
NS_7_blank <- as.matrix(read.csv("/data/biomath/Spatial_Transcriptome/silicosis/Spatial_transcriptome/A72-NS-7d/A72.csv",sep=",",header=T)) #283
NS_7_valid=setdiff(colnames(NS_7_exp),NS_7_blank[,1])
NS_7_exp_valid=NS_7_exp[,NS_7_valid] #1355
NS_7=CreateSeuratObject(counts = NS_7_exp_valid, project = 'NS_7', assay = 'Spatial',min.cells=3) #17433*1355
#读取image信息
NS_7_img <- Read10X_Image(image.dir="/data/biomath/Spatial_Transcriptome/silicosis/Spatial_transcriptome/A72-NS-7d/spatial/")
DefaultAssay(object = NS_7_img) <- 'Spatial'
NS_7_img <- NS_7_img[colnames(NS_7)]
NS_7[['image']] <- NS_7_img
NS_7$stim <- "NS_7"
save(NS_7,file="NS_7.rds")
#标准化 找高变基因 scale一条龙
#通常空转数据使用sct效果比seurat标准流程更好
NS_7_sct=SCTransform(NS_7, verbose = FALSE,,assay ="Spatial")
save(NS_7_sct,file="NS_7_sct.rds")
2.2 如果你有raw/filtered_feature_bc_matrix 的三个文件 + spatial 文件夹
#首先读取表达矩阵数据
NS2 = Read10X("./outs/filtered_feature_bc_matrix/")
#读取image信息
image2 <- Read10X_Image(image.dir = file.path("./outs/",
"spatial"), filter.matrix = TRUE)
#这里如果你想要高清图片,可以指定的!
image2 = Read10X_Image("./outs/spatial/",
image.name = "tissue_hires_image.png")
#创建seurat对象
NS2 <- CreateSeuratObject(counts = NS2, assay = "Spatial")
DefaultAssay(NS2 = image2) <- "Spatial"
#标准化 找高变基因 scale一条龙
#通常空转数据使用sct效果比seurat标准流程更好
NS2_sct=SCTransform(NS2, verbose = FALSE,,assay ="Spatial")
save(NS_7_sct,file="NS2_sct.rds")
#如果多样本分析的时候记得改名字。
image2 <- image2[Cells(x = NS2)]
NS2[["slice1"]] <- image2
#没有报错,无需转化.通常不会报错的
for (i in colnames((NS2@images$slice1@coordinates))) {
NS2@images$slice1@coordinates[[i]] <- as.integer(NS2@images$slice1@coordinates[[i]])
}
2.3 一句话读取空转数据,这个通常是自己的空转数据可以这么读取。这是10x官网的数据
test_data = Load10X_Spatial(data.dir = "./input/",
filename = "Visium_FFPE_Human_Normal_Prostate_filtered_feature_bc_matrix.h5",
assay = "Spatial",
slice = "test")
2.4 有的优秀作者会提供rds文件,这里主要注意版本问题,有时候用readRDS函数读取,有时候用load函数加载
pbmc=readRDS("./pbmc.rds")
load("./pbmc.rds")
2.5 非常规读取空转数据:有些奇葩作者不上传HE图片的,只给坐标信息
# 把空间数据当成单细胞数据读入
#1 读取表达矩阵
test_data2 = Read10X("./input/filtered_feature_bc_matrix/")
test_data2 <- CreateSeuratObject(counts = test_data2,
min.features = 0,
project = "test")
# 2读入单细胞的位置信息
position = read.csv("./No_IHC/position_information.csv",header = T,row.names = 1)
head(position)
position = select(position,imagecol,imagerow)
#将位置信息合并入单细胞seurat对象
colnames(position) = paste0("Spatial_",1:ncol(position))
test_data2[["spatial"]] <- CreateDimReducObject(embeddings = as.matrix(position),
key = "Spatial",assay = "RNA")
test_data2 = Read10X("./input/filtered_feature_bc_matrix/")
test_data2 <- CreateSeuratObject(counts = test_data2,
min.features = 0,
project = "test",
assay="Spatial")
test_data2
# An object of class Seurat
# 17943 features across 2543 samples within 1 assay
# Active assay: RNA (17943 features, 0 variable features)
coord.df = read.csv("./position_information.csv",header = T,row.names = 1)
test_data2@images$image = new(
Class = 'SlideSeq',
assay = "Spatial",
key = "image_",
coordinates = coord.df
)
本文技术不难,更多的是一种思想方法,帮助大家去开拓自己的思路。只要你理解了空间数据的数据存储形式,便可一招鲜吃遍天~
看完记得顺手点个“在看”哦!
https://mp.weixin.qq.com/s/cCPc0TtEgOe-urFnwyffzg