root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.64k stars 1.26k forks source link

[RF] RooAbsPdf::createNLL gives wrong NLL for binned pdf with bin size different to one #8005

Closed marcescalier closed 1 year ago

marcescalier commented 3 years ago

Hello. Please find below a problem with createNLL.

Describe the bug

A flat histogram is constructed for continuum while signal is constructed as a small peak in the center of the m_yy distribution.

The pseudo-data is constructed from them, while the total pdf is constructed from them as well. (so pseudo-data is exactly the same as pdf)

On purpose to illustrate the problem, I take a bin size of 5 GeV (else the problem will not appear, which appears to be normal due to the technical explanation given in the attachement).

*The nll is computed by roofit, with two methods that give the same result (probably they use the same internal function).

The nll is computed "by hand" : -I could reproduce exactly the value of nll of roofit if I "forget" to multiply the pdf by the bin size only in the sum part* of the likelihood formula.

-If I compute it in what I think is "correct", I could not find the value obtain by roofit.

Hypothesis of explanation : createNLL does not anticipate the possibility of binned pdf.

More details : pdf : 3 slides Minimum_Problem_NLL.pdf

A minimum reproducible example is given here : /afs/cern.ch/work/e/escalier/public/ForOthers/ForRootExperts/ProblemNLL

(github prevents to attach a .C file : it would be useful to authorize it if we wish to keep persistent minimum examples for the future)

Remark : the anomaly would not have consequence for the result of a fit. But if the bin size would be variable, maybe there would be (I don't know). Anyway, there is incompatibility of treatment of the bin size for the total of the pdf and the value of pdf in a given bin.

Expected behavior

the expected value is given in the example : roofit does not find the mathematical correct value because it forgets to multiply by the bin size for the "sum part" (while it does not forget to multiply by the bin size for the "total prediction" part.

To Reproduce

Steps to reproduce the behavior:

  1. Your code that triggers the issue: at least a part; ideally something we can run ourselves.

a) put : do_mistake_computation_find_official_roofit_value=1 for reproducing the result of roofit

b) put do_mistake_computation_find_official_roofit_value=0 for reproducing what I think is the correct value (different to the one of roofit)

  1. How to run your code and / or build it, e.g. root myMacro.C, ... root -b Minimum.C+

Setup

  1. ROOT version : 6.18.00
  2. Operating system : windows (session unix from mobaxterm)
  3. How you obtained ROOT, such as dnf install / binary download / you built it yourself. installed by experts on the french cluster of computers (cc in2p3, France)

Additional context

guitargeek commented 1 year ago

This issue got fixed in ROOT 6.24, where if fixed the integrals of these binned pdf and function classes.