timpeters82 / DMRcate-devel

devel git for DMRcate
Other
7 stars 8 forks source link

DMRcate design #4

Closed bioinfonext closed 5 months ago

bioinfonext commented 1 year ago

Hi, I am trying to use DMRcate first time for epic array data.

I have done the limma association analysis using below model design but not sure how to design model metrix for DMRcate for same type of BMI and CpG sites association analysis; Could you help me if below DMRcate design correct or wrong?

#phenotype file
targets<-read.table("Limma.phenotypes.txt", header=TRUE, stringsAsFactor=FALSE,row.names=NULL, sep="\t")
head(targets)

#convert eGFR into  log 2

targets$logeGFR=log2(targets$eGFR)

#set first column header as blanck

names(targets)[1] <- ""

# first column as row names
targets2<-targets[,-1]
rownames(targets2)<-targets[,1]

head(targets2)
dim(targets2)

#read mehtylation data

###################

#Start by inputting your original beta value file, skipping any of the first few columns we don't need for this analysis (easily done in preview import)

datMe <- read_csv("betas_1.csv", col_types = cols(Chromosome = col_skip(), Start = col_skip(), End = col_skip(), Strand = col_skip()))

head(betaval, 10) [,1:5]

#set first column header as blank
names(betaval)[1] <- ""

#convert into dataframe
betaval <- data.frame(betaval)
# first column as row names

betaval2<-betaval[,-1]
rownames(betaval2)<-betaval[,1]

#convert beta value into m value
mval<-log2(betaval2/(1-betaval2))

dim(mval)

head(mval, 10) [,1:5]

mval <- data.matrix(mval)
library(DMRcate)

myMs.noSNPs <- rmSNPandCH(mval, dist=2, mafcut=0.05)

design: <- model.matrix(~logeGFR + as.factor(Gender) + age +CD8T +CD4T +NK + Bcell +Mono+ PC1 + PC2 + PC3 + meanM_from_TRUEinvar,data=targets2)

myannotation <- cpg.annotate(myMs.noSNPs, analysis.type="differential",design=design, coef=2, contrats=FALSE)
dmrcoutput <- dmrcate(myannotation, lambda=1000, C=2)
results.ranges <- extractRanges(dmrcoutput, genome = "hg19")
write.table(results.ranges, file = "DMR.output.txt")

Many thanks

timpeters82 commented 1 year ago

Hello,

Like the manual says, the formulation of the design matrix is identical to that of limma. So yes, you can use the same design matrix (though it will fit using eBayes() with trend=FALSE and robust=FALSE).

You've already specified coef=2, so you don't need a contrast matrix. contrasts=FALSE by default and doesn't need to be specified in this instance.

You should specify analysis.type="differential" whenever you specify a value for coef, even when the response variable is continuous (like with BMI).

Best, Tim

bioinfonext commented 1 year ago

Dear Tim, thank you so much for you response. I have also updated my DMRcate code above. Could you please suggest how to filter significant DMR and then how to plot these.

timpeters82 commented 1 year ago

Hi there,

All the DMRs in results.ranges should be "significant" since they are indexed at the rate you specified with fdr to cpg.annotate(). As an extra level of filtering, the DMRs are ranked by Fisher's statistic (under the Fisher column in results.ranges) so you can take the top ones from there.

DMR.plot() is there for you to plot your DMRs, please read the manual for details.

Cheers, Tim

bioinfonext commented 1 year ago

Thanks Tim, I want to plot DMR 1 for those individual who has eGFR above 60 and who has eGFR below 60 but I don't have any column in my phenotype file so can I still able to plot these or should I first make a column in phenotype file and could you also help according to above code what should I mention here:CpGs=getBeta( )

DMR.plot(ranges=results.ranges1, dmr=1, CpGs=getBeta(GRSet.norm.na.good.noSnp.noXreact), what="Beta", arraytype = "EPIC", phen.col=cols, genome="hg19")

timpeters82 commented 1 year ago

Hello,

A number of things here.

1) Wouldn't your logeGFR vector be the phenotype column you're looking for?

2) Your argument to CpGs should ideally be the same argument you gave to cpg.annotate(), so in this case, use CpGs=myMs.noSNPs, what="M"

3) It sounds like you want to binarise your response variable for plotting, try this:

eGFR_binarised <- ifelse(logeGFR > 60, "eGFRhi", "eGFRlo")
groups <- c(eGFRhi="magenta", eGFRlo="forestgreen")
cols <- groups[as.character(eGFR_binarised)]
DMR.plot(ranges=results.ranges1, dmr=1, CpGs=myMs.noSNPs, what="M", arraytype = "EPIC", phen.col=cols, genome="hg19")

Cheers, Tim