Results interpretation - Githubissues

dwill023 commented 3 years ago

Hi, Just wanted to know how the enrichR_score is calculated. This score is output to the BED file when I use the exportR function.

I get scores like shown below: enrichR_score:936 enrichR_score:139 enrichR_score:729 enrichR_score:515

And wanted to know how to interpret them. The larger the more the peak enrichment?

Thanks, Desiree

your-highness commented 3 years ago

Dear @dwill023 ,

If an enrichR object is exported using normr::exportR() with type = "bed", the score represents the confidence of that region's enrichment call: https://github.com/your-highness/normR/blob/55786fbfdc3e8c4808533b168923c4b10168d7b0/R/NormRFit.R#L232-L238

If you set a false discovery threshold (a.k.a. threshold for multiple testing corrected P) via fdr = 0.01, (i) only regions with a q-vaule smaller than the threshold are reported and (ii) the score represents the statistical confidence under the multiple testing hypothesis for not being a background region scaled up by 1.000 (analogous to MACS2 bed score calculation). This is the recommended exporting method because it is statistically sound.
If no false discovery threshold is set, the score represents the posterior probability of not being a background region (scaled up by 1.000). This approach can be applicable if there is only shallow enrichment (e.g. broad histone modifications or the ChIP enrichment was not very efficient). You could also try regimeR() to fit mixtures of broad and peak enrichment.
For exports with type = "bedgraph"or type = "bigWig", no filtering for fdr is done and the score represents the estimated quantative enrichment over control. This is convenient for displaying normalized tracks (e.g. ChIP-seq - Input-Seq) in genome browsers like IGV.

Usually I exported in both ways to visualize the results. For downstream analysis the control-corrected enrichment is very convenient and was shown to correspond well to the in silico calibrated Histone Modification Density (ICeChIP HMD%)[1] in the sample's cell population[2].

May I ask what kind of data you are analysing?

[1] Grzybowski et al.; Molecular cell 58.5 (2015): 886-899.)

[2] Helmuth "Robust Normalization of Next Generation Sequencing Data"; Chapter 4.3.2

Best, Johannes

dwill023 commented 3 years ago

Thanks for getting back to me. I'm using a human transcription factor (FoxO1) chip-seq data set. I used an fdr = 0.01 as my cutoff.

Thanks for clarifying, Desiree

your-highness commented 3 years ago

Did this explanation solve your issue? Please mark as solved if so ;)

dwill023 commented 3 years ago

Yes it did, thanks!

your-highness / normR

Results interpretation #17