matthewfeickert commented 4 years ago

Description

In the backends with autodif, the AutoDiffOptimizerMixin class that all of them use takes the func

(reminder that func return both the objective function and the gradient https://github.com/scikit-hep/pyhf/blob/71d0c148a295e42f86300b7fe317aea56187ee34/src/pyhf/optimize/opt_pytorch.py#L50 ) from each backend's implementation of setup_minimize

https://github.com/scikit-hep/pyhf/blob/71d0c148a295e42f86300b7fe317aea56187ee34/src/pyhf/optimize/autodiff.py#L30-L32

and then gives it to scipy.optimize.minimize

https://github.com/scikit-hep/pyhf/blob/71d0c148a295e42f86300b7fe317aea56187ee34/src/pyhf/optimize/autodiff.py#L33-L35

This minimization doesn't take into account the Hessian at the moment. However, as @lukasheinrich notes in Issue #764

we can recode the error calculation in any backend given that we have the hessian matrix.

So we can add in Hessian information to the optimization by providing a callable hessian with something like

tv, fixed_values_tensor, func, init, hessian, bounds = self.setup_minimize(
    objective, data, pdf, init_pars, par_bounds, fixed_vals
)
fitresult = scipy.optimize.minimize(
    func, init, method="SLSQP", jac=True, hess=hessian, bounds=bounds
)

This is probably a good idea, but will require some work to make sure that there is agreement with MINUIT.

This was prompted by a discussion with @dguest (as many interesting ideas are).

matthewfeickert commented 4 years ago

Something to be on the lookout for while working on this is to make sure the problems with TensorFlow Hessians that @dantrim first reported in Issue #332 are no longer a problem in the v2.X series of TensorFlow.

kratsg commented 4 years ago

988 could cover this.

kratsg commented 3 years ago

#!/usr/bin/env python
"""
Some Hessian codes
"""
import numpy as np
from scipy.optimize import approx_fprime

def hessian ( x0, epsilon=1.e-5, linear_approx=False, *args ):
    """
    A numerical approximation to the Hessian matrix of cost function at
    location x0 (hopefully, the minimum)
    """
    # ``calculate_cost_function`` is the cost function implementation
    # The next line calculates an approximation to the first
    # derivative
    f1 = approx_fprime( x0, calculate_cost_function, *args) 

    # This is a linear approximation. Obviously much more efficient
    # if cost function is linear
    if linear_approx:
        f1 = np.matrix(f1)
        return f1.transpose() * f1    
    # Allocate space for the hessian
    n = x0.shape[0]
    hessian = np.zeros ( ( n, n ) )
    # The next loop fill in the matrix
    xx = x0
    for j in xrange( n ):
        xx0 = xx[j] # Store old value
        xx[j] = xx0 + epsilon # Perturb with finite difference
        # Recalculate the partial derivatives for this new point
        f2 = approx_fprime( x0, calculate_cost_function, *args) 
        hessian[:, j] = (f2 - f1)/epsilon # scale...
        xx[j] = xx0 # Restore initial value of x0        
    return hessian

scikit-hep / pyhf

Add Hessian support to autodiff optimizers #936

Description

988 could cover this.