ixxmu / mp_duty

抓取网络文章到github issues保存
https://archives.duty-machine.now.sh/
122 stars 30 forks source link

「文献阅读」数据挖掘--肝癌常规转录组与单细胞转录组的整合分析 #3069

Closed ixxmu closed 1 year ago

ixxmu commented 1 year ago

https://mp.weixin.qq.com/s/Iw6Q2k5eFkFKT5PB0RBufA

ixxmu commented 1 year ago

「文献阅读」数据挖掘--肝癌常规转录组与单细胞转录组的整合分析 by 生信星球

 今天是生信星球陪你的第889天

   大神一句话,菜鸟跑半年。我不是大神,但我可以缩短你走弯路的半年~

   就像歌儿唱的那样,如果你不知道该往哪儿走,就留在这学点生信好不好~

   这里有豆豆和花花的学习历程,从新手到进阶,生信路上有你有我!

前言

题目:Integrating bulk and single‐cell RNA sequencing reveals cellular heterogeneity and immune infiltration in hepatocellular carcinoma

日期:2022-06

期刊:Molecular Oncology(IF:7.4)

链接:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9168757/

摘要

肝细胞癌 (HCC) 是最常见的原发性肝癌,5 年生存率为 12-18%。HCC 的特征在于肿瘤间和肿瘤内的异质性。因此揭示 HCC 异质性背后的分子机制对于靶向治疗的发展至关重要。迄今为止,已有数种药物获批用于 HCC 治疗,但效果并不理想。由于肿瘤的异质性,HCC治疗中出现耐药性是不可避免的。超过 50% 的 HCC 患者目前接受的全身化疗被证明几乎没有效果并且对正常肝脏有毒性。

本文整合了 scRNAseq 和多组学数据的分析,揭示了 HCC 中的肿瘤异质性和免疫抑制机制。

  • 首先使用非负矩阵分解 (NMF) 将TCGA的HCC肿瘤样本分成3个亚型(S1、S2 和 S3),其中 S1 被认为是具有高 T 细胞基因表达水平和免疫评分的“热肿瘤”特征,S2被认为是具有最高肿瘤纯度的“冷肿瘤”,S3具有最差的预后以及高表达免疫抑制基因(TIGIT and PDCD1)被认为是“免疫抑制肿瘤”。

  • 之后,对单细胞数据集中的S3-like亚型(CS3)进行WGCNA和SCENIC分析,鉴定了一个转录因子——BATF,猜测它会上调免疫抑制基因的表达

  • 最后,鉴定了一个细胞互作网络,其中骨髓来源的巨噬细胞亚群可以促进免疫抑制性 T 细胞的形成。

方法

使用的数据

去掉了没有完整生存信息或者临床数据的样本后,得到了353个TCGA + 232个ICGC样本

HCC样本的非负矩阵分解(或者叫做样本聚类)
  1. Genes with a mean absolute deviation of > 1 top genes were chosen for NMF clustering

  2. NMF包进行无监督聚类:The values of k when the magnitude of the cophenetic correlation coefficient began to fall were chosen as the optimal number of clusters

其他数据的获取和处理
  • 突变数据:下载TCGA—HCC的“Masked Somatic Mutation”的somatic 突变数据,之后利用maftools处理

  • 单样本的富集分析利用了GSVA,hallmark gene sets从MSigDB下载

计算免疫浸润和肿瘤纯度
  • CIBERSORT计算免疫浸润:patients whose P-value was < 0.05 were adopted in the immune infiltration

  • Estimate包计算了 Immune, stromal and tumor purity scores

差异分析

使用limma进行分析,考虑到S1和S3是高免疫浸润的,而S2是高肿瘤纯度。所以将S1和S3的基因与ImmPort (https://immport.niaid. nih.gov/home)的免疫基因取交集,并且排除了S2中的免疫基因,最后将这些基因作为final candidate genes。

单细胞转录组数据处理
  • Seurat导入表达矩阵,Cells with UMI counts < 200 were removed

  • CCA处理批次效应

  • 文库大小归一化

  • 使用Scran包鉴定高变异基因

  • PCA to reveal biologically meaningful variations

  • FindClusters 分群,然后UMAP可视化

  • cluster之间的差异分析

  • 细胞类型注释

WGCNA分析
  • A co-expression network was constructed using the blockwiseModulesfunction with default parameters.

  • Module connectivity was defined as the absolute value of Pearson’s correlation between genes

  • The clinical trait relationship was defined as the absolute value of Pearson’s correlation between each gene and cell type

转录因子分析

输入数据是Seurat的raw UMI counts,然后过滤标准是:sum of expression > 3 x 0.005 x cell numbersdetected in at least 0.5% of the cells

之后是走SCENIC的标准流程,然后使用GENIE3 method (for a single sample) and GRNBoost (for the combined sample) to identify potential transcription factor (TF) targets

Immunosuppressed score, liver score and activated T cells scores的计算
  • Liver score was calculated as the average expression of 24 liver marker genes from Kim et al.

  • immunosuppressed score and activated T cell score were defined based on 35 known repressed markers and 28 activated GZMK-CD8 genes from Guo et al

结果1——HCC非负矩阵分解得到3个亚型

使用TCGA的数据进行NMF分析得到了120 cases in pattern cluster S1, 144 cases in cluster S2 and 89 cases in cluster S3

然后使用SubMap工具 + ICGC的数据进行了验证:

生存分析发现S3的预后是最差的,同时S3的临床数据也显示了很高的tumor stage(AJCC-T3/T4 and Neoplasm disease stage III and IV)

结果2——HCC的肿瘤异质性程度

也就是分析肿瘤纯度和免疫浸润

发现:S2的immune and stromal scores要远低于S1和S3,但它的肿瘤纯度最高

因为之前发现S3的预后是最差的,因此又使用CIBERSORT对这三个亚型看了22种免疫细胞的分布情况:

  • S1 had a higher abundance of activated NK cells (aNK), CD8+ T cells, M1 cells, and CD4 memory resting T cells;lower abundance of Treg and M0 cells

  • S2 had a lower abundance of M2 cells

T细胞的marker基因在S3中表达量更高:

另外immunosuppressed score和其他一些marker基因(VEGFA, CTLA4, HAVCR2 and TIGIT)也是在S3中高表达,因此推断:免疫异质性T细胞占比高,可能与预后差相关;另外,巨噬细胞marker基因(CD68)以及EMT的marker基因(MMP2 and MMP9)在S3中高,说明在肿瘤微环境中的这些细胞,可能与肿瘤进展相关。

相比之下,S1的预后是最好的,它高表达:activated T cell markers (CD3E, PRF1 and GZMK)、exhausted marker genes of T cells ( PDCD1 and HAVCR2)

另外,对这三个亚型进行GO_BP 和HALLMARKER 通路的GSVA分析,发现:

  • immune-related pathways, such as IL6-STAT3 and thymic T cell selection, were enriched in S1 and S3

  • CD4 activation, inflammatory response and T cell differentiation pathways showed a higher enrich- ment score in S1

  • higher enrichment scores of B cell apoptotic and WNT pathways were found in S3

综上:

  • S2 demonstrated features of ‘cold tumors’ due to a lower immune infiltration ratio

  • S1 features of ‘hot tumors,

  • S3 showed ‘immuno- suppressed tumors’ for the highly expressed immune- repressive genes.


之前有报道说:肿瘤基因组突变和抗肿瘤免疫相关
因此接下来探索了几个亚型的基因频率和拷贝数突变类型的差异

基因频率数据发现:

  • S2的高突变是:cadherin-associated protein beta 1 (CTNNB1) and ARID1A

  • S3的高突变是:TP53 and BAP1 【BAP1调控细胞死亡和线粒体代谢,这一点和S3中exhausted marker genes高表达一致】

值得注意的是:CTNNB1-non mutated group的CD3D and CTLA4表达量和免疫分数更高。之前研究表明:CTNNB1 突变的HCC病人具有较低的免疫浸润

拷贝数突变类型数据发现:

  • S1 had the most amplified variant samples 【 mainly amplified in the regions such as 1q, 5p, 8q, 6p and 11q】

  • S2 had the most deleted ones 【1q, 11q, 1q and 2q】

  • S3 mainly amplified in 8q, and 13q

YEATS4 and VIMP基因在S3中高表达,同时在S3出现扩增突变,在S1中出现缺失突变;CYFIP2 and ABLIM3在S1中高表达,同时在S1出现扩增突变,在S3中出现缺失突变

  • YEATS4 promotes HCC cell proliferation and colony formation

  • VIMP inhibits cytokine production in human CD4+ effector T cells

  • CYFIP2 is highly abundant in CD4+ cells from multiple sclero- sis patients and is involved in T cell adhesion

  • ABLIM3 is a component of adherent junctions with actin-binding activity

综上:

  • the highest mutation of BAP1, amplification of YEATS4 and VIMP, and deletion of CYFIP2 and ABLIM3, might induce an immune-repressed environment in S3

  • high mutation of CTNNB1 might inhibit immune infiltration in S2

结果3——整合常规和单细胞转录组数据构建基因分类器

首先是拿到单细胞数据

作者使用的单细胞数据是:GSE149614  【> 70,000 single-cell transcriptomes for 10 HCC patients】

同样是计算immunosuppressed, activated T cells (aT) and liver scores,发现:

  • HCC02T, HCC03T, HCC04T and HCC05T possessed the highest liver and the lowest immunosuppressed scores:定义为‘cold tumor’

  • HCC08T, HCC09T and HCC10T had the highest immunosuppressed scores:定义为“immunosuppressed tumor”

  • HCC01T, HCC06T and HCC07T with high activated T cell scores:定义为“hot tumor”

然后构建分类器

也就是找出能将上面10个样本进行分类的基因,拿到positive samples的差异基因后,发现当选择top 108个基因时,可以达到all false‐positive rates (FPRs) were 0 and all true‐positive rates (TPRs) were 1;当基因数量增加时,FPRs and TPRs的值也不再改变。

最后使用常规转录组验证分类器

使用TCGA+ICGC数据验证这108个基因的分类器。先挑出来含有这些基因的样本,然后同样分成三组,生存分析显示TCGA的S3依然是预后最差;ICGC的S3具有较高的immunosuppressive genes, high immune scores, low tumor purity score,同样也是预后最差。图D的SubMap分析显示ICGC和TCGA的结果也具有一致性。

结果4——单细胞的三个亚型之间的肿瘤微环境差异

单细胞的三个亚型编号为:CS1, CS2 and CS3

  • CS3 samples had the highest immune and immunosuppressed scores

  • CS2 the highest tumor purity and liver scores

  • CS1 was similar to S1, CS2 to S2, and CS3 to S3

之后进行降维聚类分群,得到了16 types:6007 T cells, 1845 B cells, 14 552 epithelial cells, 350 NK cells, 1850 endothelial cells and 1548 fibroblasts

继续细分群得到了:4 T cells, 2 B cells, and 3 macrophage

发现了:

  • NK and aT cells were enriched in CS1

  • mT, mB, tT and mMφ, myCAF cells were enriched in CS3

  • epithelial cells (H1, H2, H3, H4) were enriched in CS2

从侧面反映了之前的分类器的正确

之后就是分析各种免疫细胞的表达情况:

结果5——探索单细胞数据中的免疫抑制机制

因为单细胞的CS3和之前TCGA的S3是相似的,而S3又具有免疫抑制属性,所以这里主要就拿CS3来做。

首先使用WGCNA拿到CS3‐specific subtypes (mT, tT, myCAF and mMφ)的gene modules;然后使用SCENIC得到immunosuppression‐promoting TF regulons

关于WGCNA的结果:

总共是12 gene module,其中

  • green module was correlated with mT

  • magenta module with tT

  • yellow module with myCAF

  • purple module with mMφ

然后拿到每个模块的hub gene,发现:

  • mT hub genes were enriched in T cell selection and T cell differential pathways

  • tT hub genes negatively regulated T cell activation and interleukin 10 secretion

关于SCENIC的结果:

找到了核心的BATF转录因子:BATF could regulate TIGIT and CTLA4 and the co‐stimulatory gene ICOS

之后也在TCGA和ICGC中进行了表达量相关性验证

生存分析发现:High expression levels of BATF were correlated with poor prognosis,然后其他研究数据侧面反映了这一点:

  • GSE149197 of BATF‐knockout Treg cells showed significantly lower BATF, CTLA4, TIGIT and FOXP3 expression

  • BATF was barely expressed in the healthy liver single‐cell dataset (GSE115469)

因为TCGA的S3亚型具有较高的stromal infiltration,因此接下来分析了肿瘤微环境的细胞对肿瘤免疫抑制微环境的影响,发现了:

  • tT could interact with CS3‐specific mMφ through chemokines CXCL12_CXCR4, CCL4_CCR5 and CCL3_CCR1 ;tT could further suppress the immune response of T cells and ultimately promote the production of an immunosuppressive environment in HCC

  • mMφ was characterized by overexpression of the immune‐repressive gene IL10 and could interact with tT via NECTIN2_TIGIT;mMφ frequently interacted with myCAF and endothelial cells in endothelial (End) type through chemokines such as CXCL12_CXCR4 and growth factor VEGFA_FLT1

  • endothelial cells in End could also interact with tT through TIGIT_PVR, which may also promote the formation of immune‐repressive cells

最后得出结论:mMφ could directly or indirectly promote the immunosuppressive status of the S3‐like HCC subtype


初学生信,很荣幸带你迈出第一步

🤓 生信星球 🌎 一个不拽术语、通俗易懂的生信知识平台