Closed ixxmu closed 2 years ago
🍥 视频学习:
教学视频免费在:https://www.bilibili.com/video/BV177411U7oj
课程配套思维导图:https://mubu.com/doc/1cwlFgcXMg
🍥 标准代码:https://www.bioconductor.org/packages/release/bioc/vignettes/ChAMP/inst/doc/ChAMP.html(CHAMP包)
上回说到甲基化的三步质控
质控qc:champ.QC()
归一化:champ.norm()
SVD图(检查批次化效应): champ.SVD()
如果具有批次效应,推荐使用champ.runCombat函数去除,来源于ComBat 算法
目录:
1. 先探索下批次化
2. 后看看文献
rm(list = ls())
options(stringsAsFactors = F)
library("ChAMP")
library("minfi")
# 原始数据myLoad,pd
load(file = 'GSE111942_champ_load.Rdata')
# 归一化数据myNorm
load(file = 'step2-champ_myNorm.Rdata')
临床数据pd如下,
# -----SVD plot 和批次效应-----
champ.SVD(beta=myNorm,pd=myLoad$pd)
纵坐标有些遮住,依次是'Array'、'Slide'、'Sample_Well'、'Sample_Group'
四列均对应前面的pd文件,有时候前两个会写成'Sentrix_ID', 'Sentrix_Position'
注释: Array: position on MethylationEPIC BeadChip; Slide: MethylationEPIC BeadChip 例如450k芯片一共有48万多个探针,而所包含的CpG位点差不多在45万个左右,其中一个芯片(slide)包括12个阵列(array),而每一个阵列能够分析一个样本,机器可以同时分析8张芯片,所以一次性可以没有批次的分析96个样本
可以看到差异主要出现在分组group上,正是我们感兴趣的分组,因此无需进一步处理
为了探索批次化效应,人为加入性别sex的特征,看出现怎样的效果
# 加入性别
dim(myLoad$pd) #43 8
pD=myLoad$pd
pD$sex <- c(rep('F',21),rep('M',22))
champ.SVD(beta=myNorm,pd=pD)
# 去批次化
#?champ.runCombat
myCombat <- champ.runCombat(beta=myNorm,
pd=pD,
variablename="Sample_Group",#感兴趣的特征
batchname=c("sex"),# 需要矫正的因素
logitTrans=TRUE) #对beta值是否需要logit转化,默认利用原始beta进行矫正设为T,利用M值计算时设为F
champ.SVD(beta=myCombat,pd=pD)
性别的影响也没有很完美的去除掉,
还能尝试在接下来的差异分析过程中,加入一些基础的协变量如性别,年龄等来校正数据
但就害怕矫枉过正
看了大多数文献,发现很多对这个问题一笔带过,例如:
🍥 Gene-associated methylation status of ST14 as a predictor of survival and hormone receptor positivity in breast Cancer, BMC cancer, IF=4.4
The distribution of type II probes was normalized using the BMIQ function [32]. Singular value decomposition (SVD) analysis was then used to correlate the principal components with biological and technical factors. If the result of SVD analysis showed substantial technical variation, the ComBat function was used to remove the source of this variation [33].
也有文章
🍥 Tumor DNA methylation profiles correlate with response to anti-PD-1 immune checkpoint inhibitor monotherapy in sarcoma patients, journal for immunotherapy of cancer, IF=13.751
we performed Illumina methylation EPIC microarray analyses. Overall, 704,003 probes remained for further statistical evaluation after quality and sex chromosome filtering.
SVD analysis indicated that a significant variation of DNA methylation within the whole study cohort was associated with CD3+/CD8+ immune cell content, center of sample origin and sex (online supplemental figure S3A). In addition, although data were normalized, a strong variation within our dataset was seen in beta value density (online supplemental figure S3B). This variation was reduced after adjusting methylation data for the factors mentioned above (online supplemental figure S3C)
作者认为调整后beta值的分布差异情况减少了,以此来验证去批次化的效果
🍥 Distinctive epigenomic alterations in NF1-deficient cutaneous and plexiform neurofibromas drive differential MKK/p38 signaling, Epigenetics & Chromatin, IF=4.954
Sources of technical variation were found to significantly (p < 0.05) contribute to the variation explained in the first couple principle components in addition to tissue effects based on SVD analysis as implemented in ChAMP (v2.18.3) and corrected using the sva (v3.30.1) package in R for visualization purposes only
Technical variation was modeled as either fixed or random effects on the uncorrected data in the differential methylation analysis described below.
We applied a hierarchical generalized linear mixed effects model 广义线性混合模型 to identify differentially methylated loci between CNFs and PNFs, controlling for age and sex differences with a nested random effect to control for partially repeated measures.Epigenetic modulation of AREL1 and increased HLA expression in brains of multiple system atrophy patients 矫正了性别年龄
🍥 A profile of differential DNA methylation insporadic human prion disease blood:precedent, implications and clinicalpromise, University College London
I used an establishedstatistical tool, ComBAT, to regress for positional effects. Reassuringly, remainingmetadata associations with principle component 1 were (in decreasing order ofsignificance) phenotype (Sample_Group), Codon129 Genotype, MRC Scale Score and Gender, as shown in Figure 22. This shows that disease status and severity doindeed affect the methylome and suggests that sex as a covariate should beincluded in a final regression model.
I investigated specific sites which exhibited differential methylation by constructing alinear regression model of β ~ Sample_Group + Age + Sex using limma, anestablished software package for analysis of microarray data (Ritchie et al., 2015)
combat后主要区别还是在age和sex,模型中再加入矫正
所以
大家在这方面有没有什么心得呢?
审稿人会揪着这个问题问我吗?
参考:
[1] Genome-scale methylation assessment did not identify prognostic biomarkers in oral tongue carcinomas - PMC (nih.gov)
[2] 手把手教你甲基化生信分析—甲基化minfi包的使用(一)
https://mp.weixin.qq.com/s/k9Ujs3EDsEzB0-7Dz0rxqQ
✦✦
Don't forget to Subscribe, Follow,
Like & Share !
YuYuFiSH
邮箱:chenyu_202000@163.com
https://mp.weixin.qq.com/s/GOhX8XiWGmsmPRTqJ4JWmw