Open yipengsun opened 1 year ago
Some further information from my own observations:
Somehow despite the fact that the model yield is >> the data yield, the reported FCN keeps getting smaller (see the MINUIT output below). This is true even when upper limits are extended to outrageous values.
I have further tried eliminating the gaussian constraints on Lumi
and alpha_BFD1
as the source of the problem by reconstructing the likelihood using only the RooRealSumPdf
which should not contain the constraints themselves, only the functions which depend on the observables or which normalize those same functions. The behavior is the same.
More MINUIT output (print level 0):
********** ** 1 **SET PRINT 1 ********** ********** ** 2 **SET NOGRAD ********** PARAMETER DEFINITIONS: NO. NAME VALUE STEP SIZE LIMITS 1 Lumi 1.00000e+00 3.00000e-01 0.00000e+00 3.00000e+00 2 Nmu 5.00000e+04 2.49995e+04 1.00000e+00 1.00000e+06 3 RawRDst 4.03350e-02 1.99999e-02 1.00000e-06 2.00000e-01 4 alpha_BFD1 0.00000e+00 6.00000e-01 -3.00000e+00 3.00000e+00 ********** ** 3 **SET ERR 0.5 ********** ********** ** 4 **SET PRINT 1 ********** ********** ** 5 **SET STR 2 ********** NOW USING STRATEGY 2: MAKE SURE MINIMUM TRUE, ERRORS CORRECT ********** ** 6 **MIGRAD 2000 1 ********** FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4. START MIGRAD MINIMIZATION. STRATEGY 2. CONVERGENCE WHEN EDM .LT. 1.00e-03 MINUIT WARNING IN HESSE ============== Negative diagonal element 3 in Error Matrix MINUIT WARNING IN HESSE ============== 2456.94 added to diagonal of error matrix EIGENVALUES OF SECOND-DERIVATIVE MATRIX: -7.1929e+00 9.6978e-01 1.0000e+00 9.2231e+00 MINUIT WARNING IN HESSE ============== MATRIX FORCED POS-DEF BY ADDING 7.202092 TO DIAGONAL. FCN=-0.464708 FROM HESSE STATUS=NOT POSDEF 31 CALLS 32 TOTAL EDM=6.13113e+08 STRATEGY= 2 ERR MATRIX NOT POS-DEF EXT PARAMETER APPROXIMATE STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 Lumi 1.00000e+00 1.33665e-03 2.14402e-03 -1.26169e+05 2 Nmu 5.00000e+04 1.26146e+03 1.18624e-03 -3.61664e+05 3 RawRDst 4.03350e-02 1.19551e-01 2.56936e-03 -4.53011e+03 4 alpha_BFD1 0.00000e+00 2.39988e-01 2.01358e-03 -5.82940e+03 ERR DEF= 0.5 MIGRAD MINIMIZATION HAS CONVERGED. MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX. COVARIANCE MATRIX CALCULATED SUCCESSFULLY FCN=-373434 FROM MIGRAD STATUS=CONVERGED 227 CALLS 228 TOTAL EDM=6.97829e-06 STRATEGY= 2 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 Lumi 3.00000e+00 1.68695e-05 2.00130e-03** at limit ** 2 Nmu 1.00000e+06 5.62344e+00 2.00135e-03** at limit ** 3 RawRDst 2.00000e-01 1.08871e-05 9.78010e-02** at limit ** 4 alpha_BFD1 3.00000e+00 2.35826e-04 5.29109e-03** at limit **
ERR DEF= 0.5 EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 4 ERR DEF=0.5 2.982e-16 5.637e-20 1.636e-27 1.292e-23 5.637e-20 9.972e-05 4.648e-18 1.119e-17
1.636e-27 4.648e-18 2.073e-17 -4.928e-20
1.292e-23 1.119e-17 -4.928e-20 1.309e-12
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3 4
1 0.00000 1.000 0.000 0.000 0.000
2 0.00000 0.000 1.000 0.000 0.000
3 0.00001 0.000 0.000 1.000 -0.000
4 0.00001 0.000 0.000 -0.000 1.000
**********
** 7 **SET ERR 0.5
**********
**********
** 8 **SET PRINT 1
**********
**********
** 9 **HESSE 2000
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-373434 FROM HESSE STATUS=OK 23 CALLS 251 TOTAL
EDM=6.97828e-06 STRATEGY= 2 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 Lumi 3.00000e+00 1.68695e-05 4.00261e-04 1.57079e+00
WARNING - - ABOVE PARAMETER IS AT LIMIT.
2 Nmu 1.00000e+06 5.62344e+00 4.00271e-04 1.58650e+02
WARNING - - ABOVE PARAMETER IS AT LIMIT.
3 RawRDst 2.00000e-01 1.08845e-05 4.89005e-02 2.05104e+05
WARNING - - ABOVE PARAMETER IS AT LIMIT.
4 alpha_BFD1 3.00000e+00 2.35825e-04 1.05822e-03 -2.22268e+03
WARNING - - ABOVE PARAMETER IS AT LIMIT.
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 4 ERR DEF=0.5
2.982e-16 -2.818e-18 6.221e-31 -1.615e-22
-2.818e-18 9.972e-05 4.730e-19 -2.801e-16
6.221e-31 4.730e-19 2.072e-17 -5.152e-21
-1.615e-22 -2.801e-16 -5.152e-21 1.309e-12
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3 4
1 0.00000 1.000 -0.000 0.000 -0.000
2 0.00000 -0.000 1.000 0.000 -0.000
3 0.00000 0.000 0.000 1.000 -0.000
4 0.00000 -0.000 -0.000 -0.000 1.000
Hi, thanks for the reports! Okay, the problem with the RooRealSumPdf
hinted me that this new optimization was part of the problem (look for "binned likelihood fit optimization").
https://root.cern/doc/v628/release-notes.html
This optimization was used inside ATLAS for a long time to great success (speedups), so I was enabling it by default. However, it seems to be problematic here, maybe it doesn't work with the RooBarlowBeestonLL.
You should for now disable it when the demo is compiled with ROOT 6.28. So the line with MakeModelAndMeasurementFast
in histfact_demo.cpp
would become:
#if ROOT_VERSION_CODE < ROOT_VERSION(6,28,00)
auto ws = RooStats::HistFactory::MakeModelAndMeasurementFast(meas);
#else
// Disable the binned fit optimization that was enabled by default in ROOT 6.28.
// This optimization skips the normalization of the RooRealSumPdf, because
// the unnormalized bin contents already represent the yields that can be
// used by the RooNLLVar to sum the Poisson terms. However, this optimization
// doesn't work for this demo, maybe because it's not compatible with the
// RooBarlowBeestonLL. See also https://root.cern/doc/v628/release-notes.html.
HistoToWorkspaceFactoryFast::Configuration cfg;
cfg.binnedFitOptimization = false;
auto ws = RooStats::HistFactory::MakeModelAndMeasurementFast(meas, cfg);
#endif
With this, the results I get with 6.28 are already more comparable to 6.24. Here the results of fit-noshapes
:
FVAL = -1581.9159109118475
Edm = 0.000441898404860980383
Nfcn = 187
Lumi = 0.937281 +/- 0.041571 (limited)
Nmu = 68079 +/- 3149.09 (limited)
RawRDst = 0.0394995 +/- 0.00466377 (limited)
alpha_BFD1 = -1.56966 +/- 0.268056 (limited)
Info in <Minuit2>: Minuit2Minimizer::Hesse Using max-calls 2000
Info in <Minuit2>: Minuit2Minimizer::Hesse Hesse is valid - matrix is accurate
0.000441626
Fit ran with status 0
Stat error on R(D*) is 0.004650
EDM at end was 0.000442
RooArgList:: = (Lumi,Nmu,RawRDst,alpha_BFD1)
CURRENT NUISANCE PARAMETERS:
0: Lumi = 0.937281 +/- 0.0414063
1: Nmu = 68079 +/- 3136.33
3: alpha_BFD1 = -1.56966 +/- 0.267557
FVAL = -1581.92046482683691
Edm = 1.80973200685341769e-07
Nfcn = 258
Lumi = 0.936911 +/- 0.041654 (limited)
Nmu = 63811.4 +/- 402.673 (limited)
RawRDst = 0.0394923 +/- 0.00466481 (limited)
alpha_BFD1 = -1.56652 +/- 0.266527 (limited)
Info in <Minuit2>: Minuit2Minimizer::Hesse Using max-calls 2000
Info in <Minuit2>: Minuit2Minimizer::Hesse Hesse is valid - matrix is accurate
1.8221e-07
Fit ran with status 0
Stat error on R(D*) is 0.004658
EDM at end was 0.000000
RooArgList:: = (Lumi,Nmu,RawRDst,alpha_BFD1)
CURRENT NUISANCE PARAMETERS:
0: Lumi = 0.936911 +/- 0.0416284
1: Nmu = 63811.4 +/- 401.335
3: alpha_BFD1 = -1.56652 +/- 0.264945
There are still some problems. The produced plots don't look correct, and the Nmu
parameter has an error that is way off. This is a problem I already see with 6.26, and it is probably related to the reorganization of the HistFactory model RooFit representation:
https://github.com/root-project/root/pull/8167
Indeed, it could be related to the new RooBinWidthFunction
. I can only continue the investigation next week on Wednesday, if you learn something new in the meantime @yipengsun and @CoffeeIntoScience please let me know!
Awesome progress!
Something that's making me scratch my head a bit is that, if the binned fit optimization is just taking the fact that this is self-normalized as a given, I would think you could make it work by getting rid of the bin width terms with RooBinWidthFunction::disableClass()
. Then the value of the RooRealSum would just be (schematically) Histogram[bin i]*normalization constants
which I would think would be exactly what the optimization expects... I'm missing something or else there is something happening under the hood that I don't mean to do...
In any case, just FYI the issue is the same with the BBNLL or the vanilla NLL, so I don't think it can be a peculiar interaction with the RooBBNLL.
One thought of what might be different here is if in ATLAS's use case they are not using ExportOnly, then the setup and steering of the minimization might be different? I didn't see any magic code when I glanced at MakeModelAndMeasurementFast
though.
The only other intelligent thought I have right now is that, because of the recently-fixed issue, they definitely weren't working with 3D, and maybe something goes funny in more dimensions? It's not well fleshed out. But I thought I'd drop the thought since I'll be out of action for at least a week starting wednesday
Hi @guitargeek can we close the issue ?
The fit converges.
To Reproduce
We have our demo fitter publicly available at: https://github.com/umd-lhcb/histfactory-fitter-demo/tree/dev
The
dev
branch contains our latest development. The baseline fit was performed withmake fit-noshapes
so all shape variations are turned off.Setup
gcc12
dnf install
/ binary download / you built it yourself. We build our self with the help ofnix
.Additional information
We noticed that newer ROOT started to use
RooBinWidthFunction
to support histogram templates with arbitrary binning scheme:https://github.com/root-project/root/commit/dc65afe00ec72d18ebc5d31e7b2d6b50c6a20423#diff-42ea09980f1ed2bacde4381f212c35d78be6ddedccf746953dc99d6455525d8aL1256-R1221
Maybe there's a bug when introducing
RooBinWidthFunction
?FYI @guitargeek