ixxmu / mp_duty

抓取网络文章到github issues保存
https://archives.duty-machine.now.sh/
106 stars 30 forks source link

DoRothEA--单细胞转录因子评分 #4830

Closed ixxmu closed 5 months ago

ixxmu commented 5 months ago

https://mp.weixin.qq.com/s/XjVqpSfR-NBN3NkI_cEOGw

ixxmu commented 5 months ago

DoRothEA--单细胞转录因子评分 by 生信小博士

写在开头

jimmy老师在他的生信游民系列提出了很多学徒任务,仔细去看每一篇文章的动机,都是科研过程中经常会经历的。

今天我们分享的任务是来自:生物学功能注释三板斧。 本次任务我计划分为四个部分来写,今天写第一部分。

  • 1.DoRothEA--单细胞转录因子评分

  • 2.PROGENy--单细胞通路活性评分

  • 3.DoRothEA--转录组转录因子评分

  • 4.PROGENy--转录组通路活性评分




正文部分

通常我们都是利用基因集合去做单细胞评分,那么是否可以通过有权重的基因集来计算细胞的转录活性?这就是DoRothEA开发出来的意义。


DoRothEA数据集简介

DoRothEA是一种包含转录因子(TF)与其靶标相互作用的基因集合。这些相互作用形成了被定义为调节子(regulons)的TF及其对应靶点的集合。DoRothEA regulons 整合了来自文献、ChIP-seq峰值、TF结合位点基序以及基因表达推断等不同类型的互作证据。TF和靶标之间的互作可信度根据支持的证据数量划分为A-E五个等级,A表示最可信,E表示可信度较低。DoRothEA适用于bulk RNAseq和scRNAseq的数据。该工具可以与多种统计方法结合使用,从而生成一种功能分析工具,以推断基因表达数据中的TF活性。通过考虑TF直接转录靶标的mRNA水平,而不是TF本身的基因表达水平,来计算活性。

CollecTRI数据集简介

CollecTRI 来源的调控子总结了从 12 种不同 转录因子(TF)-目标基因相互作用的数据库。该集合扩大了转录因子的覆盖范围,并与其他已知的 GRN 进行了比较。结果表明,在使用 knockTF 数据集根据基因表达数据识别受干扰的 TF 方面,该集合具有卓越的性能。

简单来说,CollecTRI 是第二代DoRothEA,对DoRothEA做了更新。

加载Regulons数据集、单细胞数据



load("~/gzh/pbmc3k_final_v4.rds")

#1 加载数据集合----## We read Dorothea Regulons for Human:dorothea_regulon_human <- get(data("dorothea_hs", package = "dorothea"))##如果是小鼠,就用##dorothea_regulon_mouse <- get(data("dorothea_mm", package = "dorothea"))

#1 过滤数据集的置信度----## We obtain the regulons based on interactions with confidence level A, B and Cregulon <- dorothea_regulon_human %>% dplyr::filter(confidence %in% c("A","B","C"))
#类似结果,作者现在更推荐collectri的结果!net <- decoupleR::get_collectri(split_complexes = T,organism='human' )head(net)regulon


作者现在更推荐使用collectri的数据

计算转录因子活性

 

#2计算细胞的TF活性----## We compute Viper Scores pbmc <- run_viper(pbmc, regulon,                  options = list(method = "scale", minsize = 4,                                  eset.filter = FALSE, cores = 4,                                  verbose = FALSE))
pbmc@assays$dorothea


计算出得分之后,后续就是考验个人的可视化功底了!

可视化一

可视化方案一

#3 对dorothea矩阵进行降维聚类分群-----## We compute the Nearest Neighbours to perform clusterDefaultAssay(object = pbmc) <- "dorothea"pbmc <- ScaleData(pbmc)pbmc <- RunPCA(pbmc, features = rownames(pbmc), verbose = FALSE)pbmc <- FindNeighbors(pbmc, dims = 1:10, verbose = FALSE)pbmc <- FindClusters(pbmc, resolution = 0.5, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:10, umap.method = "uwot", metric = "cosine")
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25, verbose = FALSE)
top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(10, avg_log2FC);top10pbmc@assays$dorothea@data[1:4,1:4]top10 =top10[top10$avg_log2FC > 0.3,] DoHeatmap(pbmc ,top10$gene,size=3,slot = 'scale.data')


从热图可以看出,其实还是有一大部分TF在不同的细胞类型中特异性表达的。


如果想把热图画好看一点,可以参考:优雅地展示20w单细胞热图|非Doheatmap


可视化方案二



d.all=pbmc head(d.all@meta.data)

Idents(d.all)=d.all$cell.type  a=AverageExpression(d.all,return.seurat = TRUE)  #可以用 a=AggregateExpression(d.all,return.seurat = TRUE)试试看哦   a$orig.ident=rownames(a@meta.data) head(a@meta.data);a
markers=pbmc.markers ; head(markers)
markers$cluster=factor(markers$cluster,levels = unique(markers$cluster) )
DoHeatmap(a,draw.lines = FALSE, slot = 'scale.data',assay = 'dorothea', features = markers %>%group_by(cluster) %>%dplyr::slice_max(avg_log2FC,n = 5) %>% .$gene )

可视化方案三

上面进行可视化的TF都是通过findmarkers找到的,我们还可以根据std值找TF

制备画图所有数据

#4获取Viper得分矩阵,评价细胞群的TF活性----## We transform Viper scores, scaled by seurat, into a data frame to better ## handling the resultsviper_scores_df <- GetAssayData(pbmc, slot = "scale.data",                                 assay = "dorothea") %>%  data.frame(check.names = F) %>%  t();viper_scores_df[1:4,1:4]

## We create a data frame containing the cells and their clustersCellsClusters <- data.frame(cell = names(Idents(pbmc)), cell_type = as.character(Idents(pbmc)), check.names = F) #也可以使用其他的分类信息CellsClusters[1:4, ]
## We create a data frame with the Viper score per cell and its clustersviper_scores_clusters <- viper_scores_df %>% data.frame() %>% rownames_to_column("cell") %>% gather(tf, activity, -cell) %>% inner_join(CellsClusters);viper_scores_clusters[1:4,]

## We summarize the Viper scores by cellpopulationsummarized_viper_scores <- viper_scores_clusters %>% group_by(tf, cell_type) %>% summarise(avg = mean(activity), std = sd(activity));summarized_viper_scores



var(1,3 )
5#5细胞群间变化最大的20个TFs进行可视化----
## We select the 20 most variable TFs. (20*9 populations = 180) 9个细胞亚群highly_variable_tfs <- summarized_viper_scores %>% group_by(tf) %>% mutate(var = var(avg)) %>% # 计算变异度 calculates the variance of the "avg" column. So, for each transcription factor, it computes the variance of its average score. ungroup() %>% top_n(180, var) %>% distinct(tf);highly_variable_tfs

## We prepare the data for the plotsummarized_viper_scores_df <- summarized_viper_scores %>% semi_join(highly_variable_tfs, by = "tf") %>% dplyr::select(-std) %>% spread(tf, avg) %>% data.frame(row.names = 1, check.names = FALSE) ;summarized_viper_scores_df



palette_length = 100my_color = colorRampPalette(c("Darkblue", "white","red"))(palette_length)
my_breaks <- c(seq(min(summarized_viper_scores_df), 0, length.out=ceiling(palette_length/2) + 1), seq(max(summarized_viper_scores_df)/palette_length, max(summarized_viper_scores_df), length.out=floor(palette_length/2)))
viper_hmap <- pheatmap(t(summarized_viper_scores_df),fontsize=14, fontsize_row = 10, color=my_color, breaks = my_breaks, main = "DoRothEA (ABC)", angle_col = 45, treeheight_col = 0, border_color = NA)


可视化方案四

遗憾的是在这种可视化方案中,效果似乎不太好。

这也可以理解,毕竟这是转录因子活性分析,看的是转录因子下游基因的表达情况,而不是转录因子本身。所以对转录因子本身进行可视化似乎不是一个好的选择。

Idents(pbmc)='RNA'pbmc@meta.data$GATA1= pbmc@assays$dorothea@scale.data['GATA1',] 
FeaturePlot(pbmc,features = "GATA1",reduction = "umap" )| DimPlot(pbmc,label=T,group.by = 'cell.type')
FeaturePlot(pbmc,features = "PBX2",reduction = "umap" )| DimPlot(pbmc,label=T,group.by = 'cell.type')

参考:
https://www.jianshu.com/p/49617173a8adhttps://bioconductor.org/packages/release/data/experiment/vignettes/dorothea/inst/doc/dorothea.htmlhttps://github.com/saezlab/CollecTRIhttps://github.com/saezlab/CollecTRI/blob/main/scripts/casestudy/02.2_pbmc_TFactivity.R