scikit-hep / cabinetry

design and steer profile likelihood fits
https://cabinetry.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
26 stars 19 forks source link

Using Scipy optimiser (for ranking and limits) #413

Open MarcelHoh opened 1 year ago

MarcelHoh commented 1 year ago

In my experience using scipy as the minimiser is significantly faster than minuit. This can be very useful when the uncertainty of each parameter is not needed such as limit setting and ranking parameters.

Within cabinetry the cabinetry.fit.._fit_model function always overrides the set minimizer in pyhf to be minuit. This makes the ranking and limits quite a bit slower than they need to be. For the ranking only the first fit, with no fixed nuisance parameters, should need to use minuit as the optimiser.

Would it be possible to make the default switch to minuit an option in cabinetry.fit._fit_model? This would allow for always using minuit if the user desires but also to switch to scipy for all cases where the uncertainty is not needed.

alexander-held commented 1 year ago

Hi @MarcelHoh, that sounds like a useful addition. We could add a minimizer (or similar) keyword argument to limit, ranking, scan, significance and then allow switching to scipy for the cases where parameter uncertainties are not strictly needed. There are currently internally two ways of doing the minimization (depending on the custom_fit setting), either going through pyhf or directly using iminuit. This setting has historically been useful but probably has limited use now, so it might be a good time to drop this from ranking and scan and thereby avoid having to handle the case where a scipy minimizer wrapping would also have to be added to cabinetry.

MarcelHoh commented 1 year ago

I can give it a go. To add some data, with scipy a test fit is taking me 40ms to 12s with minuit. For a ranking with ~190 parameters, as I'm using for a test, this ends up being 6 minutes (tested) with scipy to ~3+ hours with minuit (estimated).

MarcelHoh commented 1 year ago

We could add a minimizer (or similar) keyword argument to limit, ranking, scan, significance and then allow switching to scipy for the cases where parameter uncertainties are not strictly needed.

I am not familiar with the code (this is my first real look at it) but I would naively lean towards adding a force_minuit option to _fit_model and only setting the backend to minuit when this is true. This would then let users choose between scipy and minuit if they wanted to. On the other hand, I cannot currently think of a reason to not use scipy when the uncertainties are not needed, so it might make sense to force this.

alexander-held commented 1 year ago

That's an impressively big difference in runtime between minimizers! I don't think I have seen one that is quite this striking before.

I would prefer to stick with Minuit as the default minimizer regardless. It is often slower and can be more fragile, but I have seen a number of cases where the scipy minimizer appears to have converged but returns nonsensical results. I have a lot more confidence in the minimum being accurate when determined via Minuit and think the safe but slow choice is therefore the better default. This is a place where cabinetry is somewhat opinionated, switching the default from what pyhf uses and making iminuit a core dependency.