sartorlab / methylSig

R package for DNA methylation analysis
17 stars 5 forks source link

Sample contributions to mean methylation #17

Closed bmreilly closed 7 years ago

bmreilly commented 7 years ago

I was wondering if you ever considered setting a limit to the sample contributions to mean methylation? From the initial MethylSig paper it looks like the mean methylation level is calculated as the coverage-weighted average for the samples covering an individual CpG (please correct me if I am wrong). Using this method, it is possible that a site that is covered at say ~5x by 20 samples and covered at 500x by one sample would have a skewed mean methylation level towards the value of the one high-coverage sample. If you were to set a limit on the contribution of a single sample to the coverage-weighted methylation average then you may be able to prevent this type of skewed average. I think I've seen this approach applied in previous studies, and I'm curious to hear what you guys think of this approach.

Thanks, Brian

rcavalcante commented 7 years ago

Hi Brian,

I spoke with my PI (Maureen) and the original developer (Yongseok) and here is what they had to say.

Best, Raymond

Dear Brian,

The contribution of samples in the mean methylation estimate depends on the coverage and also the dispersion parameter. If the dispersion parameter is large (which is estimated and will be different for different CpG sites), the contribution of coverage is small. For example, if the dispersion parameter is 1 (maximum value), all samples will have the same contribution towards the mean estimation regardless of coverage, while if the dispersion parameter is 0 (minimum value), the mean methylation level is weighted by coverage and more likely to be affected by larger coverage. However, in this case, the percent methylation is estimated to be the same across samples in a group anyway, so an imbalance in coverage shouldn’t matter.

Does this explanation sound satisfactory to you? Or do you think our method would be more improved by additionally having a contribution limit from each sample towards estimating the mean? We appreciate any additional insight in how we can improve our package.

Thank you,

Yongseok & Maureen

bmreilly commented 7 years ago

Hi Raymond,

That explanation makes me much more comfortable about the issue. I apparently read the statistics incorrectly and thought the mean methylation levels were calculated as a coverage weighted mean, but this explanation clarified that for me. Thanks for answering my question.

Best, Brian