root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.65k stars 1.26k forks source link

[RF][HistFactory] 3D HistFactory fit broken in 6.28 #12729

Open yipengsun opened 1 year ago

yipengsun commented 1 year ago
CoffeeIntoScience commented 1 year ago

Some further information from my own observations:

Somehow despite the fact that the model yield is >> the data yield, the reported FCN keeps getting smaller (see the MINUIT output below). This is true even when upper limits are extended to outrageous values.

I have further tried eliminating the gaussian constraints on Lumi and alpha_BFD1 as the source of the problem by reconstructing the likelihood using only the RooRealSumPdf which should not contain the constraints themselves, only the functions which depend on the observables or which normalize those same functions. The behavior is the same.

More MINUIT output (print level 0):

**********                                                                                                                                                                                                                                   **    1 **SET PRINT           1                                                                                                                                                                                                              **********                                                                                                                                                                                                                                   **********                                                                                                                                                                                                                                   **    2 **SET NOGRAD                                                                                                                                                                                                                         **********                                                                                                                                                                                                                                   PARAMETER DEFINITIONS:                                                                                                                                                                                                                          NO.   NAME         VALUE      STEP SIZE      LIMITS                                                                                                                                                                                           1 Lumi         1.00000e+00  3.00000e-01    0.00000e+00  3.00000e+00                                                                                                                                                                          2 Nmu          5.00000e+04  2.49995e+04    1.00000e+00  1.00000e+06                                                                                                                                                                          3 RawRDst      4.03350e-02  1.99999e-02    1.00000e-06  2.00000e-01                                                                                                                                                                          4 alpha_BFD1   0.00000e+00  6.00000e-01   -3.00000e+00  3.00000e+00                                                                                                                                                                      **********                                                                                                                                                                                                                                   **    3 **SET ERR         0.5                                                                                                                                                                                                                **********                                                                                                                                                                                                                                   **********                                                                                                                                                                                                                                   **    4 **SET PRINT           1                                                                                                                                                                                                              **********                                                                                                                                                                                                                                   **********                                                                                                                                                                                                                                   **    5 **SET STR           2                                                                                                                                                                                                                **********                                                                                                                                                                                                                                   NOW USING STRATEGY  2: MAKE SURE MINIMUM TRUE, ERRORS CORRECT                                                                                                                                                                                **********                                                                                                                                                                                                                                   **    6 **MIGRAD        2000           1                                                                                                                                                                                                     **********                                                                                                                                                                                                                                   FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.                                                                                                                                                                                START MIGRAD MINIMIZATION.  STRATEGY  2.  CONVERGENCE WHEN EDM .LT. 1.00e-03                                                                                                                                                                 MINUIT WARNING IN HESSE                                                                                                                                                                                                                      ============== Negative diagonal element 3 in Error Matrix                                                                                                                                                                                   MINUIT WARNING IN HESSE                                                                                                                                                                                                                      ============== 2456.94 added to diagonal of error matrix                                                                                                                                                                                     EIGENVALUES OF SECOND-DERIVATIVE MATRIX:                                                                                                                                                                                                            -7.1929e+00  9.6978e-01  1.0000e+00  9.2231e+00                                                                                                                                                                                       MINUIT WARNING IN HESSE                                                                                                                                                                                                                      ============== MATRIX FORCED POS-DEF BY ADDING 7.202092 TO DIAGONAL.                                                                                                                                                                         FCN=-0.464708 FROM HESSE     STATUS=NOT POSDEF     31 CALLS          32 TOTAL                                                                                                                                                                                    EDM=6.13113e+08    STRATEGY= 2      ERR MATRIX NOT POS-DEF                                                                                                                                                                EXT PARAMETER                APPROXIMATE        STEP         FIRST                                                                                                                                                                           NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE                                                                                                                                                                          1  Lumi         1.00000e+00   1.33665e-03   2.14402e-03  -1.26169e+05                                                                                                                                                                        2  Nmu          5.00000e+04   1.26146e+03   1.18624e-03  -3.61664e+05                                                                                                                                                                        3  RawRDst      4.03350e-02   1.19551e-01   2.56936e-03  -4.53011e+03                                                                                                                                                                        4  alpha_BFD1   0.00000e+00   2.39988e-01   2.01358e-03  -5.82940e+03                                                                                                                                                                                                    ERR DEF= 0.5                                                                                                                                                                                                   MIGRAD MINIMIZATION HAS CONVERGED.                                                                                                                                                                                                           MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.                                                                                                                                                                                             COVARIANCE MATRIX CALCULATED SUCCESSFULLY                                                                                                                                                                                                    FCN=-373434 FROM MIGRAD    STATUS=CONVERGED     227 CALLS         228 TOTAL                                                                                                                                                                                      EDM=6.97829e-06    STRATEGY= 2      ERROR MATRIX ACCURATE                                                                                                                                                                 EXT PARAMETER                                   STEP         FIRST                                                                                                                                                                           NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE                                                                                                                                                                          1  Lumi         3.00000e+00   1.68695e-05   2.00130e-03** at limit **                                                                                                                                                                        2  Nmu          1.00000e+06   5.62344e+00   2.00135e-03** at limit **                                                                                                                                                                        3  RawRDst      2.00000e-01   1.08871e-05   9.78010e-02** at limit **                                                                                                                                                                        4  alpha_BFD1   3.00000e+00   2.35826e-04   5.29109e-03** at limit **         
                               ERR DEF= 0.5                                                                                                                                                                                                   EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  4    ERR DEF=0.5                                                                                                                                                                                2.982e-16  5.637e-20  1.636e-27  1.292e-23                                                                                                                                                                                                   5.637e-20  9.972e-05  4.648e-18  1.119e-17
  1.636e-27  4.648e-18  2.073e-17 -4.928e-20
  1.292e-23  1.119e-17 -4.928e-20  1.309e-12
 PARAMETER  CORRELATION COEFFICIENTS
       NO.  GLOBAL      1      2      3      4
        1  0.00000   1.000  0.000  0.000  0.000
        2  0.00000   0.000  1.000  0.000  0.000
        3  0.00001   0.000  0.000  1.000 -0.000
        4  0.00001   0.000  0.000 -0.000  1.000
 **********
 **    7 **SET ERR         0.5
 **********
 **********
 **    8 **SET PRINT           1
 **********
 **********
 **    9 **HESSE        2000
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=-373434 FROM HESSE     STATUS=OK             23 CALLS         251 TOTAL
                     EDM=6.97828e-06    STRATEGY= 2      ERROR MATRIX ACCURATE
  EXT PARAMETER                                INTERNAL      INTERNAL
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE
   1  Lumi         3.00000e+00   1.68695e-05   4.00261e-04   1.57079e+00
                                 WARNING -   - ABOVE PARAMETER IS AT LIMIT.
   2  Nmu          1.00000e+06   5.62344e+00   4.00271e-04   1.58650e+02
                                 WARNING -   - ABOVE PARAMETER IS AT LIMIT.
   3  RawRDst      2.00000e-01   1.08845e-05   4.89005e-02   2.05104e+05
                                 WARNING -   - ABOVE PARAMETER IS AT LIMIT.
   4  alpha_BFD1   3.00000e+00   2.35825e-04   1.05822e-03  -2.22268e+03
                                 WARNING -   - ABOVE PARAMETER IS AT LIMIT.
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  4    ERR DEF=0.5
  2.982e-16 -2.818e-18  6.221e-31 -1.615e-22
 -2.818e-18  9.972e-05  4.730e-19 -2.801e-16
  6.221e-31  4.730e-19  2.072e-17 -5.152e-21
 -1.615e-22 -2.801e-16 -5.152e-21  1.309e-12
 PARAMETER  CORRELATION COEFFICIENTS
       NO.  GLOBAL      1      2      3      4
        1  0.00000   1.000 -0.000  0.000 -0.000
        2  0.00000  -0.000  1.000  0.000 -0.000
        3  0.00000   0.000  0.000  1.000 -0.000
        4  0.00000  -0.000 -0.000 -0.000  1.000
guitargeek commented 1 year ago

Hi, thanks for the reports! Okay, the problem with the RooRealSumPdf hinted me that this new optimization was part of the problem (look for "binned likelihood fit optimization"). https://root.cern/doc/v628/release-notes.html

This optimization was used inside ATLAS for a long time to great success (speedups), so I was enabling it by default. However, it seems to be problematic here, maybe it doesn't work with the RooBarlowBeestonLL.

You should for now disable it when the demo is compiled with ROOT 6.28. So the line with MakeModelAndMeasurementFast in histfact_demo.cpp would become:

#if ROOT_VERSION_CODE <  ROOT_VERSION(6,28,00)
  auto ws = RooStats::HistFactory::MakeModelAndMeasurementFast(meas);
#else
  // Disable the binned fit optimization that was enabled by default in ROOT 6.28.
  // This optimization skips the normalization of the RooRealSumPdf, because
  // the unnormalized bin contents already represent the yields that can be
  // used by the RooNLLVar to sum the Poisson terms. However, this optimization
  // doesn't work for this demo, maybe because it's not compatible with the
  // RooBarlowBeestonLL. See also https://root.cern/doc/v628/release-notes.html.
  HistoToWorkspaceFactoryFast::Configuration cfg;
  cfg.binnedFitOptimization = false;
  auto ws = RooStats::HistFactory::MakeModelAndMeasurementFast(meas, cfg);
#endif

With this, the results I get with 6.28 are already more comparable to 6.24. Here the results of fit-noshapes:

ROOT 6.28

FVAL  = -1581.9159109118475
Edm   = 0.000441898404860980383
Nfcn  = 187
Lumi      = 0.937281     +/-  0.041571  (limited)
Nmu   = 68079    +/-  3149.09   (limited)
RawRDst   = 0.0394995    +/-  0.00466377    (limited)
alpha_BFD1    = -1.56966     +/-  0.268056  (limited)
Info in <Minuit2>: Minuit2Minimizer::Hesse Using max-calls 2000
Info in <Minuit2>: Minuit2Minimizer::Hesse Hesse is valid - matrix is accurate
0.000441626
Fit ran with status 0
Stat error on R(D*) is 0.004650
EDM at end was 0.000442
RooArgList:: = (Lumi,Nmu,RawRDst,alpha_BFD1)
CURRENT NUISANCE PARAMETERS:
0: Lumi          = 0.937281 +/- 0.0414063
1: Nmu           = 68079 +/- 3136.33
3: alpha_BFD1            = -1.56966 +/- 0.267557

ROOT 6.24

FVAL  = -1581.92046482683691
Edm   = 1.80973200685341769e-07
Nfcn  = 258
Lumi      = 0.936911     +/-  0.041654  (limited)
Nmu   = 63811.4  +/-  402.673   (limited)
RawRDst   = 0.0394923    +/-  0.00466481    (limited)
alpha_BFD1    = -1.56652     +/-  0.266527  (limited)
Info in <Minuit2>: Minuit2Minimizer::Hesse Using max-calls 2000
Info in <Minuit2>: Minuit2Minimizer::Hesse Hesse is valid - matrix is accurate
1.8221e-07
Fit ran with status 0
Stat error on R(D*) is 0.004658
EDM at end was 0.000000
RooArgList:: = (Lumi,Nmu,RawRDst,alpha_BFD1)
CURRENT NUISANCE PARAMETERS:
0: Lumi          = 0.936911 +/- 0.0416284
1: Nmu           = 63811.4 +/- 401.335
3: alpha_BFD1            = -1.56652 +/- 0.264945

There are still some problems. The produced plots don't look correct, and the Nmu parameter has an error that is way off. This is a problem I already see with 6.26, and it is probably related to the reorganization of the HistFactory model RooFit representation: https://github.com/root-project/root/pull/8167

Indeed, it could be related to the new RooBinWidthFunction. I can only continue the investigation next week on Wednesday, if you learn something new in the meantime @yipengsun and @CoffeeIntoScience please let me know!

CoffeeIntoScience commented 1 year ago

Awesome progress!

Something that's making me scratch my head a bit is that, if the binned fit optimization is just taking the fact that this is self-normalized as a given, I would think you could make it work by getting rid of the bin width terms with RooBinWidthFunction::disableClass(). Then the value of the RooRealSum would just be (schematically) Histogram[bin i]*normalization constants which I would think would be exactly what the optimization expects... I'm missing something or else there is something happening under the hood that I don't mean to do...

In any case, just FYI the issue is the same with the BBNLL or the vanilla NLL, so I don't think it can be a peculiar interaction with the RooBBNLL.

One thought of what might be different here is if in ATLAS's use case they are not using ExportOnly, then the setup and steering of the minimization might be different? I didn't see any magic code when I glanced at MakeModelAndMeasurementFast though.

The only other intelligent thought I have right now is that, because of the recently-fixed issue, they definitely weren't working with 3D, and maybe something goes funny in more dimensions? It's not well fleshed out. But I thought I'd drop the thought since I'll be out of action for at least a week starting wednesday

dpiparo commented 7 months ago

Hi @guitargeek can we close the issue ?