smithlabcode / methpipe

A pipeline for analyzing DNA methylation data from bisulfite sequencing.
http://smithlabresearch.org/methpipe
66 stars 27 forks source link

Using radmeth to call DMRs #207

Closed vagan21 closed 2 years ago

vagan21 commented 2 years ago

Hello,

I have a question in regards to calling DMRs if you have replicates.

I have WGBS data from two beta-cell populations. Let's call them B1 and B2. I have 2 replicates for each population. These replicates, while they consist of the same cell type, were derived from different mice: one from a male mouse, and another from a female. So I have a B1 population derived from a female mouse, and a B1 population derived from male. The same goes for the B2 population. Considering this, would you recommend running radmeth to account for variation between replicates within each beta cell population? My goal is to ultimately identify genotype-driven DMRs. If there are any influences on genome-wide methylation levels due to sex-effects, I want to remove those.

Initially, I ran methdiff and dmr, calling DMRs between the B1 and B2 populations for each sex separately. But after reading some papers and re-reading the methpipe program, I am starting to think I should have used radmeth. This way, my --factor is genotype, testing for differential methylation between B1 and B2 due to genotype, and my variation between replicates can be accounted for.

I hope this makes sense. I would appreciate your thoughts.

Thanks.

guilhermesena1 commented 2 years ago

Hey,

so your design matrix should look something like this

    is_male is_b1
B1_R1   0   1
B1_R2   1   1
B2_R1   0   0
B2_R2   1   0

and your radmeth call should be

radmeth regression -f is_B1 regression-matrix.txt

where regression-matrix.txt is generated from merge-methcounts with the -t option, using your 4 samples as input.

this should remove DM CpGs that are sex-dependent.

vagan21 commented 2 years ago

Hello,

Thank you for your feedback. I wanted to follow up about this as we decided to prep more B1 and B2 from males and females. Our initial PCA was showing B1 and B2 replicates clustering by sex, regardless of genotype. In efforts to tease out any methylation differences between these beta-cell populations due to genotype, and not sex or both, we decided to sequence more reps and determine if what we are seeing is true. In reviewing the radmeth function, I had a question regarding your design matrix. Our goal is to identify differentially methylated CpG sites due to genotype between B1 and B2. One is positive for a TF and the other is not.

Looking at the design matrix directly below, I would run radmeth regression -f is_B1 regression-matrix.txt in order to call differentially methylated CpG sites that are genotype-dependent. Ultimately, we want to call DMRs. However, as I mentioned, there is another factor at bay here: sex. Genotype isn't the only factor that differentiates B1 from B2 samples. Some of these genotype-dependent differentially methylated CpG sites could also be sex-driven, correct? Since we have B1 replicates from males and females and likewise for B2, it's important to adjust for any methylation differences between B1 replicates from males and females, and likewise for B2 replicates, because of sex.

         is_B1  base
B1_F_R1       1       1
B1_F_R2       1       1
B1_M_R1       1       1 
B1_M_R2       1       1 
B2_F_R1       0       1
B2_F_R2       0       1
B2_M_R1       0       1
B2_M_R2       0       1

Output: DMRs that are not all JUST genotype-dependent...could also be affected by sex alone or sex and genotype combined

I would run the analysis you indicated above:

    is_male is_b1
B1_R1   0   1
B1_R2   1   1
B2_R1   0   0
B2_R2   1   0

identifying DM CpGs that are sex-dependent and identify DMRs from those CpGs. Then of those DMRs called as 'genotype-dependent', I can exclude those that are actually sex-dependent from the analysis you suggested. Would this be the direction you go in to call DMRs between B1 and B2 due to genotype alone?


Another way I thought about doing this was the following:

         is_male    is_b1
B1_F_R1       0       1
B1_F_R2       0       1
B1_M_R1       1       1 
B1_M_R2       1       1 

radmeth regression -f is_male regression-matrix_table1.txt

         is_male    is_b1 
B2_F_R1       0       0
B2_F_R2       0       0
B2_M_R1       1       0
B2_M_R2       1       0

radmeth regression -f is_male regression-matrix_table2.txt

From both of the analyses above, I can identify DM CpGs, comparing samples within each B1 and B2 group, and the DMRs they represent that are sex-dependent. Then from this output:

         is_B1  base
B1_F_R1       1       1
B1_F_R2       1       1
B1_M_R1       1       1 
B1_M_R2       1       1 
B2_F_R1       0       1
B2_F_R2       0       1
B2_M_R1       0       1
B2_M_R2       0       1

be able to differentiate the DMRs that are sex-dependent from genotype-dependent. This approach might be the same as the one you suggested for identifying DM CpGs that are sex-dependent. It was easier for me to consider one factor at a time. Like I said, I want to identify DMRs between B1 and B2 due to genotype and account for any variation between replicates so the DMRs that are called are genotype-dependent. I would really appreciate it if this approach makes sense.

Sorry for the long post.

Thanks, Verda

guilhermesena1 commented 2 years ago

Hey,

So imagining the Venn diagram of sex and genotype. Let

To get A (genotype, not sex)

$ radmeth regression -f is_b1 design-matrix.txt proportion-table.meth
$ cat design.txt
         is_B1  base  is_male
B1_F_R1       1       1   0
B1_F_R2       1       1   0
B1_M_R1       1       1   1
B1_M_R2       1       1   1
B2_F_R1       0       1   0
B2_F_R2       0       1   0
B2_M_R1       0       1   1
B2_M_R2       0       1   1

To get B (sex, not genotype):

$ radmeth regression -f is_male design-matrix.txt proportion-table.meth
$ cat design.txt

To get C: First get A ∪ C by ignoring the sex information

$ radmeth regression -f is_b1 design-matrix.txt proportion-table.meth
$ cat design.txt
         is_B1  base
B1_F_R1       1       1
B1_F_R2       1       1
B1_M_R1       1       1
B1_M_R2       1       1
B2_F_R1       0       1
B2_F_R2       0       1
B2_M_R1       0       1
B2_M_R2       0       1

Then subtract the results of getting A from the results of the aforementioned run (e.g. using bedtools subtract)

This way you can get DMs on all combinations of effects.

I think your approach of testing DMRs just within B1 and B2 based on sex would work too, but more replicates always helps with resolution, and if you have information on B1/B2, you are able to regress that out using the full dataset. Might be worth trying it out though (you don't need the is_b1 column for that though)

Apologies if I didn't understand the goals fully, but I hope maybe some of this may be helpful.

andrewdavidsmith commented 2 years ago

Closing due to inactivity.