Test RooGradMinimizer - Githubissues

egpbos commented 6 years ago

After #10 is done, we should write a test comparing optimization of a RooGaussian pdf via a regular RooMinimizer and maybe also RooGaussMinimizer to doing the same with RooGradMinimizer. We must get exactly the same results given the same dataset and initial parameter values.

Run with a few different tests (1D, multi-dim; small, big workspaces) to be sure it works reliably.

egpbos commented 6 years ago

To summarize where we left off in #10, some first runs of the comparison of Minuit2 (via RooMinimizer) and RooGradMinimizer in GradMinimizer.cpp showed:

Likelihood values are typically equal in the first 14 digits
Parameter of interest mu is usually equal in its first 4 digits
Limits on mu are always a factor $\sqrt{2}$ higher in the RooGradMinimizer case
EDM is off by sometimes orders of magnitude, but always very small in both cases
Number of reported function calls is too low in RooGradMinimizer (in reality it calls the function one time less than Minuit2 does).

One additional observation from a few more test runs is that the RooGradMinimizer sometimes does not converge in the same number of function calls! Sometimes it takes a lot more with some warnings like:

Info in <Minuit2>: DavidonErrorUpdator: delgam < 0 : first derivatives increasing along search line
Info in <Minuit2>: VariableMetricBuilder: matrix not pos.def, gdel > 0
Info in <Minuit2>: gdel = 1.37581e+07
Info in <Minuit2>: gdel = -9.45864e+08

Info in <Minuit2>: DavidonErrorUpdator: delgam < 0 : first derivatives increasing along search line
Info in <Minuit2>: VariableMetricBuilder: matrix not pos.def, gdel > 0
Info in <Minuit2>: gdel = 2052
Info in <Minuit2>: gdel = -153427

or even

Info in <Minuit2>: DavidonErrorUpdator: delgam < 0 : first derivatives increasing along search line
Info in <Minuit2>: VariableMetricBuilder: matrix not pos.def, gdel > 0
Info in <Minuit2>: gdel = 2.59465e+07
Info in <Minuit2>: gdel = -1.96343e+09
Info in <Minuit2>: VariableMetricBuilder: no improvement in line search
Info in <Minuit2>: VariableMetricBuilder: iterations finish without convergence.
Info in <Minuit2>: VariableMetricBuilder : edm = 11495
Info in <Minuit2>:             requested : edmval = 2e-05
Info in <Minuit2>: VariableMetricBuilder: INVALID function minimum - edm is above tolerance, : edm = 2877.43
Info in <Minuit2>: VariableMetricBuilder: Required tolerance  is 10 x edmval  : edmval = 2e-05
Info in <Minuit2>: Minuit2Minimizer::Minimize : Minimization did NOT converge, Edm is above max
Minuit2Minimizer : Invalid Minimum - status = 3

Find out what causes these differences! Mainly:

[x] \sqrt{2} difference in POI limits
[x] POI difference > 4 digits
[x] EDM different order of magnitude
[x] DavidonErrorUpdator warning
[x] VariableMetricBuilder line search warning

egpbos commented 6 years ago

Tried replacing eps2 with 2*sqrt(eps), didn't change the any of the above points. Also tried replacing eps with the MnMachinePrecision version (which differs by a factor 4 from std::numeric_limits<double>::epsilon()), but also doesn't change the above points.

This was the last remaining difference with the Minuit2 implementation that I found, so there must be something that I missed.

egpbos commented 6 years ago

Checked test script by setting boundaries on POI very high, so that effectively the minuit-internal and -external values should be equal (arctan thing to remove boundaries in internal parameter space). This uncovered very odd behavior, the POI jumps to large values in the GradMinimizer at the first step. Possibly caused by wrong initial gradient.

[x] Compare GradMinimizer initial gradient values to those of Minuit initial gradient

egpbos commented 6 years ago

Clearly this is an issue:

initial gradient minuit:
fGrd[0] = 1e+06 fG2[0] = 1e+12  fGstep[0] = 1e-07
our initial gradient:
fGrd[0] = 10    fG2[0] = 100    fGstep[0] = 0.01

[x] Find out why they are different
[x] Make them the same :P

egpbos commented 6 years ago

In fact, the above was with mu[0,-100000,100000]. If we set mu[0,-10,10] we get:

fGrd[0] = 95.7004   fG2[0] = 9158.57    fGstep[0] = 0.00104493

for Minuit, whereas ours stays exactly the same.

egpbos commented 6 years ago

By the way, the factor sqrt(2) in the POI limits was fixed by adding setErrorLevel to RooGradMinimizer.

egpbos commented 6 years ago

We had set the error on mu to 0.1 manually. When this is not done, it apparently defaults to 2 (for boundaries -10,10) and the initial gradient changes to

fGrd[0] = 0.5   fG2[0] = 0.25   fGstep[0] = 0.2

egpbos commented 6 years ago

When we then set mu[0,-1000,1000], it becomes

fGrd[0] = 0.005 fG2[0] = 2.5e-05    fGstep[0] = 0.5

So now it apparently is dependent on the boundaries too.

egpbos commented 6 years ago

Perhaps after all I do need to do the int2ext and ext2int conversions. Apparently it influences the initial gradient in Minuit2 significantly, which maybe it should, so that the first step doesn't immediately go out of bounds.

[x] Implement int2ext and ext2int conversions in NumericalDerivatorMinuit2
[x] Apply them in the initial gradient calculator

egpbos commented 6 years ago

Moving NumericalDerivatorMinuit2 from ROOT::Math to RooFit, since I really need to link with Minuit2 for the int2ext/ext2int conversions (and it also allows me to clean up the precision thing).

egpbos commented 6 years ago

Unfortunately, the initial gradient does not fix the differences in EDM and POI and also the number of function calls is still different.

egpbos commented 6 years ago

I think the problem is that Minuit2 also transforms the gradient from external to internal coordinates using DInt2Ext in AnalyticalGradientCalculator::operator().

[x] Implement an inverse DExt2Int function.
[x] Transform our initial gradient (which is now in internal coordinates) to external coordinates using DExt2Int.

egpbos commented 6 years ago

Tried dividing out the DInt2Ext factor in the initial gradient, but this doesn't help. In fact, it hardly changes things.

When printing out s0 at the beginning of the do-while loop of Minuit2::VariableMetricBuilder::Minimum(), the initial values are indeed quite far off from what they are in the Minuit2 case. For Minuit2 the first s0 is:

minimum function Value: 16138.46645139
minimum edm: 2069.61683815
minimum internal state vector: LAVector parameters:
    -0.2088278458461

minimum internal Gradient vector: LAVector parameters:
      -18447.1067083

minimum internal covariance matrix: LASymMatrix parameters:
  2.4327273e-05

For our derivator without the DInt2Ext correction the first s0 is:

minimum function Value: 56225.63481053
minimum edm: 1096476.547189
minimum internal state vector: LAVector parameters:
     -1.311874784789

minimum internal Gradient vector: LAVector parameters:
     -21982.38636908

minimum internal covariance matrix: LASymMatrix parameters:
   0.0090763172

For our derivator with the DInt2Ext correction the first s0 is:

minimum function Value: 56049.10221712
minimum edm: 1092248.220251
minimum internal state vector: LAVector parameters:
     -1.311874784789

minimum internal Gradient vector: LAVector parameters:
     -21939.96024556

minimum internal covariance matrix: LASymMatrix parameters:
   0.0090763172

egpbos commented 6 years ago

It seems like the initial s0 comes from seed.State() in the top-level VariableMetricBuilder::Minimum(). Why this differs is unclear, but already in this top-level function, at creation time, it is different, as can be seen when setting printLevel to 2.

In Minuit2:

trying nominal calculation
Minuit2Minimizer: Minimize with max-calls 500 convergence for edm < 1 strategy 0
MnSeedGenerator: for initial parameters FCN = 56023.82843258
fGrd[0] = 14.8443   fG2[0] = 220.354    fGstep[0] = 0.00673659
MnSeedGenerator: Initial state:   - FCN =   56023.82843258 Edm =     -3102.62 NCalls =      5
MnSeedGenerator: Negative G2 found - new state:   - FCN =   16164.38458147 Edm =      2053.17 NCalls =     14
VariableMetric: start iterating until Edm is < 0.001
VariableMetric: Initial state   - FCN =   16164.38458147 Edm =      2053.17 NCalls =     14

In our derivator:

trying GradMinimizer
Minuit2Minimizer: Minimize with max-calls 500 convergence for edm < 0.01 strategy 0
fGrd[0] = 14.8443   fG2[0] = 220.354    fGstep[0] = 0.00673659
INTERNAL: fGrd[0] = 14.8443 fG2[0] = 220.354    fGstep[0] = 0.00673659
EXTERNAL: fGrd[0] = 19.3257 fG2[0] = 220.354    fGstep[0] = 0.00673659
VariableMetric: start iterating until Edm is < 1e-05
VariableMetric: Initial state   - FCN =   56023.82843258 Edm =  1.09004e+06 NCalls =      1

Actually, this shows a couple of differences:

VariableMetric edm target is different by a factor 100!
MnSeedGenerator does a lot of stuff, apparently, without going through the Numerical2PGradientCalculator!

But one important thing to notice: the initial FCN value is equal after all, just not anymore when the process reaches VariableMetricBuilder::Minimum().

egpbos commented 6 years ago

There are two MnSeedGenerator::operator() overloads: one for generic GradientCalculators and one for AnalyticGradientCalculators. Our GradMinimizer uses the latter.

egpbos commented 6 years ago

InitialGradientCalculator is called from MnSeedGenerator::operator() for AnalyticGradientCalculators... Did I even need to initialize it at all in our derivator?!

egpbos commented 6 years ago

The two operator() functions do almost the same things, except:

There's a if(gc.CheckGradient()) block in the AnalyticGradientCalculator overload, which does some checking and aborts if checks don't pan out.
The block below if(ng2ls.HasNegativeG2(dgrad, prec)); in the AnalyticGradientCalculator case, this creates a new Numerical2PGradientCalculator! What?!
[x] Find out why [second bullet above].

egpbos commented 6 years ago

The second bullet doesn't change anything... Tried changing it to take the analytical gradient calculator instead of the Numerical 2P one and the answer stayed the same. ~Probably because the negative G2 is not encountered here.~ No, the G2 is in fact negative for the nominal run, but not for the GradMinimizer run!

[x] Find out why G2 is negative in the nominal run but not in the GradMinimizer run.

egpbos commented 6 years ago

As I just mailed Lorenzo, I think I found why I cannot exactly reproduce the Minuit2 results from an external gradient calculator: https://github.com/root-project/root/blob/master/math/minuit2/src/AnalyticalGradientCalculator.cxx#L45

The AnalyticalGradientCalculator only forwards the gradient itself, not the g2 and gstep.

Options to fix this:

Build a new class for an external derivator that can also calculate g2 and gstep
Modify AnalyticalGradientCalculator so that it can do this as well
Wait for Lorenzo's reply, that hopefully contains a tip that will save a lot of effort 😄

egpbos commented 6 years ago

Lorenzo agrees that modifying the current class(es) to be able to also provide g2 and gstep is the best way to go.

[x] Do this.

egpbos commented 6 years ago

These have to be transformed as well, just like the gradient currently is transformed. We may need to implement a second derivative Jacobian for that, similar to DInt2Ext.

egpbos commented 6 years ago

After some more int2ext/ext2int magic, I managed to get the exact same output in the tests. The exact values of the gradients, second derivatives and step sizes still differ during the minimization (inside the derivator), but apparently, this does not matter for the outcome in terms of traveling through the parameter space, since both the parameters x that are visited and the corresponding function values f(x) are identical for Minuit2 and RooGradMinimizer.

egpbos commented 6 years ago

I'm guessing the reason the gradients aren't identical internally is that I'm not transforming the step sizes at all...

egpbos commented 6 years ago

The step transformation wasn't the issue, I was doing the conversion from external back to internal wrong, was using the new parameter to transform, whereas I should be using the old parameter. Now fixed and now everything is exactly equal, only sometimes edm differs at the machine precision order, i.e. only in the last 1/2 digits, so that's fine. So we can finally tick all the above boxes!

Some remaining issues regarding Minuit extension are gathered in #18, since the goal of this issue is to get our gradient working.

For this issue, what remains is to do some additional tests. These can also be used when we move on to a parallelized version of RooGradMinimizer.

~[ ] 1D (Gaussian)~
~[ ] multi-dimensional~
~[ ] small "realistic" workspace(s)~
~[ ] big "realistic" workspace(s)~

egpbos commented 6 years ago

In fact, let's do this properly and build real roottest tests and also make rootbench benchmarks out of them. New issue: #19

[ ] Run the tests and benchmarks when they're done in #19

egpbos commented 6 years ago

In trying to setup the Google Test version of GradMinimizer.cpp ( #19 ), I found that edm and the mu error are not actually 100% equal, though they always are very close, typically close to machine precision level, but sometimes they differ up to about the 6th (edm) or 8th (mu error) digits. Looking again at the test output, it seems like there is one remaining difference in the minimizer runs:

trying nominal calculation
Minuit2Minimizer: Minimize with max-calls 500 convergence for edm < 1 strategy 0
...
trying GradMinimizer
Minuit2Minimizer: Minimize with max-calls 500 convergence for edm < 0.01 strategy 0

The aimed for edm values are different.

Apparently, we forgot to set this value in RooGradMinimizer, i.e. we took out setEps from our copy of RooMinimizer. Added it back in, compiling to run test...

egpbos commented 6 years ago

Doesn't change anything in the resulting edm and mu.

egpbos commented 6 years ago

The only remaining difference is that the default minimizer type is set to Minuit(1) in the ctor for RooMinimizer, whereas I set it to Minuit2 for RooGradMinimizer. Possibly, the initialization of parameters is different in these cases. Let's try setting it to Minuit 1 in the RooGradMinimizer as well...

egpbos commented 6 years ago

This also doesn't change things. Let's leave this for now. I'll leave a reference to this discussion in the code if anyone wants to follow up.

egpbos commented 6 years ago

The hierarchical test (see #19) ends up walking a slightly different path at the end. This shows that (for pathlogical cases, but still) the small differences we noticed above can in fact lead to large differences in end result. We therefore need to fix this now.

Let's try printing and comparing all the exact values in the derivative calculations of Minuit2 and GradMinimizer to find out where the differences first appear.

egpbos commented 6 years ago

At least one problem that was identified (the first one that appears) is with the conversion of parameters from internal to external parameter space and back. Specifically, in the double bounded case that we were testing, the functions involved are sin(double) and asin(double). Changing the calculations to long double would solve our problem, since then sin(asin(x)) == x in double precision. However, this would probably change results in previous fits...

egpbos commented 6 years ago

I set the tests back to EXPECT_EQ (I had downgraded them to EXPECT_FLOAT_EQ).

Changing the precision of the int2ext / ext2int functions to long double fixes most of the tests, almost all of which did not pass before in EXPECT_EQ mode. This is using 10 for loop iterations, with 4 EXPECTs per loop, so 40 tests. In run 6 (7th run) three numbers were not exactly equal (mu, muerr and edm). In run 9 (10th), two numbers were unequal (mu and edm):

/Users/pbos/projects/apcocsm/root-roofit_gradient_testing-worktree/roofit/roofitcore/test/RooGradMinimizer.cxx:115: Failure
      Expected: mu0
      Which is: 0.00096625
To be equal to: mu1
      Which is: 0.00096625
/Users/pbos/projects/apcocsm/root-roofit_gradient_testing-worktree/roofit/roofitcore/test/RooGradMinimizer.cxx:116: Failure
      Expected: muerr0
      Which is: 0.0100001
To be equal to: muerr1
      Which is: 0.0100001
/Users/pbos/projects/apcocsm/root-roofit_gradient_testing-worktree/roofit/roofitcore/test/RooGradMinimizer.cxx:117: Failure
      Expected: edm0
      Which is: 8.68852e-11
To be equal to: edm1
      Which is: 8.68851e-11

/Users/pbos/projects/apcocsm/root-roofit_gradient_testing-worktree/roofit/roofitcore/test/RooGradMinimizer.cxx:115: Failure
      Expected: mu0
      Which is: 0.00199726
To be equal to: mu1
      Which is: 0.00199726

/Users/pbos/projects/apcocsm/root-roofit_gradient_testing-worktree/roofit/roofitcore/test/RooGradMinimizer.cxx:117: Failure
      Expected: edm0
      Which is: 1.10307e-10
To be equal to: edm1
      Which is: 1.10307e-10

egpbos commented 6 years ago

RooFit tests (ctest -R 'roofit') run unchanged, so no side effects there.

egpbos commented 6 years ago

Same with RooStats and HistFactory tests (ctest -R 'tutorial-roostats' / -histfactory') and Minuit tests (ctest -R 'minuit').

egpbos commented 6 years ago

Other fitting tests (ctest -R 'fit' -E 'roofit', not sure whether they all use Minuit, but anyway) all pass as well.

egpbos commented 6 years ago

Possibly relevant roottest tests (ctest -R 'math-mathcore|fit|meta-unit_unittests|roofit') don't show issues either.

egpbos commented 6 years ago

To not have to deal with merging Minuit changes to long double into ROOT core, we may for the time being simply clone the parameter transformation functions into RooFit and make them long double.

[ ] ~Try this out.~
[ ] ~Make sure it still works, i.e. that there are no problems with using both the double Minuit and the long double RooFit versions simultaneously.~

egpbos commented 6 years ago

We went for a different approach, see above commit message (3651eb8). Now works perfectly, up to the bit!

egpbos commented 6 years ago

Cheered too early. The 1D Gauss tests are perfect, the multi-layer tree of gaussians and gamma functions as well, but the N-D Gaussian sum test is still off by quite a margin.

roofit-dev / root

Test RooGradMinimizer #11