smithlabcode / methpipe

A pipeline for analyzing DNA methylation data from bisulfite sequencing.
http://smithlabresearch.org/methpipe
67 stars 27 forks source link

Assigning of reads into um: unmethylated-methylated and mu: methylated-unmethylated to its alleles by 'allelicmeth' #170

Closed kashiff007 closed 3 years ago

kashiff007 commented 3 years ago

Hi I have started working with ASM recently and found methpipe is the easiest and most convenient way to find it. Although the 'allelicmeth' is quite fast and I also understand the output but I am unable to find how ASM probability score is calculated? I am also wondering how allelicmeth assign the reads into um: unmethylated-methylated and mu: methylated-unmethylated to their respective alleles?

I apologize if I missed any document but could you provide any source where its explained? I also tried to read allelicmeth program but it doesn't help me much.

andrewdavidsmith commented 3 years ago

We can look at the documentation again and add more content. I suspect you are right about us missing something. One key point: no individual read is given a “call” of methylated or unmethylated. In fact, the two alleles need not be a “methylated” vs “unmethylated” allele. They could simply be two distinct methylation patterns. Of course in reality it’s always high vs low methylation. Regarding the reads, each is assigned a posterior probability of being associated with one of the two models. But it’s extremely unlikely we would bring that information to the surface for output. Again, we will look at the documentation again and thanks for the constructive comment.

andrewdavidsmith commented 3 years ago

@kashiff007 can you let me know if your reason for wanting "calls" on reads to "U" or "M" is for (a) visualization or (b) downstream analysis? Depending on the answer, we might be able to suggest an approach, and document it, even if we don't include it directly within an existing tool.

kashiff007 commented 3 years ago

thanks for the quick response @andrewdavidsmith . I wanted to look for allele specific methylation for my samples from BS-seq reads. First, along with methpipe, I used several other software to find genome-wide methylation and based on their location correlate them with corresponding gene expression. But now I suspect (which has a higher chance to be true in many location) that several methylation are allele specific, so I used 'allelicmeth' to find how reads are distributed and assigned to each CpG location. allelicmeth output tells me for each CpG site what is the coverage of reads and how many are methylated from allelic point of view. I further want to used this output to analyze the allele specific gene expression. For instance, a tuple from my allelicmeth output looks like: chr1 65649 65650 + CpG 0.153846 13 11 1 0 1 I understand 0.153846 is the ASM probability but unable to understand how its calculated by coverage (13), MM (11), MU (1), UM (0) and UU (1) ? Also what I understand from the documents that higher probability means higher chance of ASM, but many tuple in allelicmeth output show ASM probability of 1 (which is 100%) with no reads under MU and UM categories. Just to be clear I don't want to know the "U" or "M" read numbers separately.

Long story short I need to understand this process for downstream analysis. Thanks in advance.

andrewdavidsmith commented 3 years ago

Ok. We will make sure the documentation is clear on what those numbers mean. Shouldn't take us too long.

guilhermesena1 commented 3 years ago

Not sure if this is sufficient to close the issue but I have added a more descriptive explanation of the score in the manual. The score is a Fisher's exact test. Consider a 2 x 2 matrix with C/T in rows and columns and each element is the count of CCs, CTs, TCs and TTs. We test the hypothesis that the counts of CTs and TCs arise by chance through a Fisher's exact test. The rejection of this hypothesis (which is defined by the user based on the score we report) would imply allele-specific methylation.

I expanded on this on 63eb161 . I'm closing the issue for now but I'm happy to expand on the explanation if it is still unclear.