perishky / meffil

Efficient algorithms for analyzing DNA methylation data.
Artistic License 2.0
53 stars 28 forks source link

ewas / champ output comparison #42

Closed iskandari closed 1 year ago

iskandari commented 2 years ago

I have used both the meffil package, and a combination of minfi and ChAMP to determine differentially methylated positions in Schizophrenic vs normal brain tissue samples. IDAT files were obtained from GSE61107 and the following sample sheet.

I am getting completely different results from both packages - with meffil, I obtained 20 significant CpG sites whereas with minfi + ChAMP, I obtain 250,000 significant sites. Out of the 20 sites I get from meffil, only 17 match the other output. I've double-checked the code to make sure it aligns with the meffil tutorial, and followed publicly available tutorials (link) for the other packages. Here, you can find the code I used to make this comparison for your reference.

What could be causing such large discrepancies in the exact same dataset? The dorsolateral prefrontal cortex (DLPFC) cell reference is used in both meffil and minfi+ChAMP, functional normalization (minfi: preprocessFunnorm() ) was used in both as well. In minfi+ChAMP, SVA was used to estimate surrogate variables which were then regressed out. One difference is that p-value adjustmemt in ChAMP defaults to the Benjamini-Hochburg method, whereas meffil uses the Bonferroni correction, however, this alone cannot account for such a vast difference in output. Any guidance about the differences between meffil.ewas and champ.DMP() would be greatly appreciated. Thank you!

perishky commented 2 years ago

Thank you so much for providing code and outputs for this case. I note that meffil uses only 9 surrogate variables whereas the minfi/CHAMP analysis uses 15. I note that SVA in meffil.ewas() is applied to top 50K CpG sites whereas your application of SVA for CHAMP appears to be applied to all CpG sites. That could account for some EWAS differences, but I doubt these explain the big differences you identify. In functional normalization, the default number of control probe principal components for minfi is 2, whereas in meffil you are using 5. This seems more likely to explain the large difference you see. You could verify this by setting nPCs=5 when running preprocessFunnorm(). If you still observe large EWAS differences, then the next step would be to compare the surrogate variables being generated by each approach.