nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
125 stars 7 forks source link

p values for DMR scores? #122

Closed EpiAllele closed 5 months ago

EpiAllele commented 6 months ago

Hello ModKit Devs. Does the score produced using the “modkit dmr” command come with an associated p-value? If not, do you have instructions on how to do this? Or perhaps refer a third party tool that can do differential methylation analysis with significance testing using modkit bedmethyl output? Thanks

ArtRand commented 6 months ago

Hello @EpiAllele,

The short answer is "no", the "score" from modkit dmr does not have any associated significance value like a p-value. Given a comparison, a higher score indicates the samples are more different at a locus/region. There is some polars code on issue 83 that will parse the 5hmC/5mC output - and you could adapt it to other modification sets. There is also more elaborate discussion of the scoring and options for how to use the output in issue 93. I'm afraid I cannot make a strong recommendation for which method to use without having more details about your experiment. Happy to advise.

EpiAllele commented 6 months ago

Hi @ArtRand . Thanks for the quick response. The experiment is a simple setup of comparing a treatment (3 biological replicates) vs control (3 biological replicates) and looking at effects of the treatment on changes in methylation. I have very limited computational background but I will take a look at the polars code from issue 83 and also read up on discussions from issue 93.

ArtRand commented 6 months ago

Hello @EpiAllele,

If you suspect that the treatment is going to have an effect at certain regions, I would use those regions as input to modkit dmr multi on all 6 of the samples. You may want to refer to the documentation on how to use multiple samples. Then I would take the high scoring regions and look at the reads in a viewer such as IGV. Honestly, I find this is the simplest first step to get an idea of what's happening in an experiment. If you don't have a set of regions you want to look at, you can run modkit dmr without the regions and it will score all of the individual sites. Then you can look and see if the consistently high scoring sites are in interesting areas of the genome. The algorithm with individual sites is a little slower than I'd like right now - but I'm going to have a faster version in the next release. Good luck!

EpiAllele commented 6 months ago

Thanks @ArtRand ! This is really helpful. For viewing in IGV, is it as simple as loading the indexed modbam and checking the modified bases option on the bam track?

ArtRand commented 6 months ago

@EpiAllele It's just that simple! I like to also add a bedgraph track from modkit.

ArtRand commented 5 months ago

Hello @EpiAllele

The latest version of modkit (v0.2.5) has a MAP-based p-value feel free to give it a try. It should run a lot faster too. Feel free to re-open this issue if you encounter any problems or have any questions.