jernst98 / ChromHMM

GNU General Public License v3.0
71 stars 18 forks source link

Calculation of Overlap/Neighborhood/Enrichment #14

Closed snaketron closed 6 years ago

snaketron commented 6 years ago

Dear developers,

I am having trouble understanding how the different enrichment/overlap analysis are performed. Could you please provide the explicit equations or a description in your wiki or here on how this is done?

If this is already explained in some publication, could you please name it here.

Best Regards

jernst98 commented 6 years ago

Hi,

The enrichments are fold enrichments. By default the calculation is as follows, let: A - be the number of bases in the state B - be the number of bases in the external annotation C - be the number of bases in the state and the external annotation D - be the number of bases in the genome

The fold enrichment is then defined as (C/A)/(B/D).

The methods section of Ernst and Kellis, Nature Biotech 2010 had this equation except there the enrichments were defined at the bin resolution opposed to the base resolution and were based on the posterior opposed to the max-posterior assignment. ChromHMM can still compute the fold enrichments in that way by adding the flags -binres and -posterior, respectively, as described in the user manual, but they are non-default options.

I can add more details on this calculation in the user manual for the next version of ChromHMM.

Best, -Jason