manoharan-lab / holopy

Hologram processing and light scattering in python
GNU General Public License v3.0
133 stars 50 forks source link

MCMC is not the same as fitting #265

Closed vnmanoharan closed 4 years ago

vnmanoharan commented 5 years ago

In 3.3, both MCMC and least-squares fitting have been rolled into Strategy objects, and the main method for MCMC has been renamed from sample() to optimize(). But MCMC is not an optimization method, and the type of information returned from running the sampler is not the same as the type of information returned from running an optimization algorithm. Optimization and sampling serve different purposes: optimization returns a point estimate and sampling can be used to estimate the entire posterior distribution. While both are "strategies" in some sense, they are strategies to achieve two different ends. We need to rethink how fitting should enter into the object hierarchy.

barkls commented 5 years ago

The simplest fix is probably just to revert the method names of the Strategy objects from "optimize" to "fit" and "sample" as appropriate.

vnmanoharan commented 5 years ago

@barkls (looping in @ralex0, @briandleahy):

This wouldn't solve the semantic problem that optimization and MCMC are strategies to achieve two different goals. Actually there are three different inference goals:

  1. Obtain a point estimate representing the maximum of the posterior probability. This is what optimizing the posterior does.
  2. Obtain a point estimate and covariance matrix from minimizing chi-squared subject to bounds. This is what least-squares fitting does. You can think about this as finding the MAP and curvature of the posterior, but it's not the same posterior as in (1) since it assumes uniform priors and Gaussian likelihood.
  3. Obtain an estimate of the shape of the posterior PDF. This is what MCMC sampling does. We can also obtain a point estimate representing the "best fit" but this is a separate step.

These are three different goals. The only connection between the strategies to achieve these goals is that they all aim at inference of some sort, but only (1) and (3) operate on the same posterior PDF, and none of the strategies return the same type of information. So it doesn't make sense to lump CMA-ES, least-squares, and MCMC algorithms under the same category.

The source of this dilemma seems to be the term "fit," which can mean any of the above three things. To resolve the dilemma we have to resolve this ambiguity.

One solution is to make some of the tasks above functions, and reserve Strategy for MCMC methods only. For example, you could define a sample() function in the hp.inference module. It might look something like this:

result = hp.inference.sample(model, data, strategy)

Here, model would include the optical model, likelihood, and priors. With data, sample() has everything needed to calculate the posterior. strategy specifies what MCMC method to use as well as the starting positions of the walkers.

You could also define least_squares_fit() to be something like

result = hp.inference.least_squares_fit(model, data)

In this case the function should throw an exception if the priors are not uniform.

Finally, for optimization, you would have

result = hp.inference.optimize(model, data, method, guess)

where method is 'CMA-ES' or whatever method you want to use from scipy.optimize.minimize(). Here again model and data are sufficient to define the posterior.

A second approach is to make the posterior probability an object, defined by the likelihood, priors, and physical model. Then sample() and optimize() (or, maybe better, find_map()) become methods of the posterior object. This approach would be more Bayesian, but if you wanted to do least-squares fitting, you would have to define a different posterior.

barkls commented 5 years ago

Here is the functional API we specified in our meeting today (and follow-up conversation with @briandleahy). Below are five different ways of interacting with the setup.


  1. Simplest call uses default optimizer, initial_guess from model
    hp.inference.point_estimate(model, data)

    @briandleahy suggests that the function name should be a verb, so we will want to rename from point_estimate. This example (and all below) can also work with hp.inference.sample().


  1. Specify initial guess as either dict or list - I'm not sure which is better. They match Model.parameters and Model._parameters, respectively.
    hp.inference.point_estimate(model, data, initial_guess={'r':1, 'n':1.6, 'center.0':5, 'center.1':5,'center.2':5})
    hp.inference.point_estimate(model, data, initial_guess=[1, 1.6, 5, 5, 5])

  1. Specify a method by name, with options passed in as a dict - this matches the API from scipy.optimize.minimize.
    hp.inference.point_estimate(model, data, method="cma")
    hp.inference.point_estimate(model, data, method="cma", options={})

    @vnmanoharan suggests this use case shouldn't work for hp.inference.sample.


  1. Specify a method by passing in an object - @ralex0 and I like to be able to define a method object and reuse it. Note that in these examples I'm not customizing the object at all but really you would when instantiating it. Unclear what the correct namespace should be.
    hp.inference.point_estimate(model, data, method=hp.inference.point_estimate_methods.CMAMethod())
    my_cma_method = hp.inference.point_estimate_methods.CMAMethod()
    hp.inference.point_estimate(model, data, method=my_cma_method)

  1. Specify a Method object (sorry for ambigous naming) and call its point_estimate or sample method. Note that we can choose to have these methods by hidden to enforce the functional API in examples 1-4, but they will likely call this behind the scenes anyway.
    my_cma_method = hp.inference.point_estimate_methods.CMAMethod()
    my_cma_method.point_estimate(model, data, initial_guess=None)
    my_cma_method._point_estimate(model, data, initial_guess=None)

We also need to figure out what these functions return - probably something similar to the current v3.3 FitResult, but probably with more hierarchy and separation between point_estimate and sample outputs. That can wait until after the API is established.

barkls commented 5 years ago

Thanks for the feedback @ralex0. Any thoughts @briandleahy or @vnmanoharan?

vnmanoharan commented 5 years ago

I don't see the feedback from @ralex0. In any case, I am worried about supporting too many different interfaces to the code. There are several (overlapping) use cases:

With that in mind, let's examine the 5 interfaces above in terms of their use cases:

We should probably figure out which use cases we want to prioritize. That will help us determine what APIs we want to support.

For example, if we want to prioritize non-expert, interactive use, we might encourage something like the following

hp.inference.least_squares_fit(scatterer, data, noisesd=noisesd)

where scatterer is an instance of a Scatterer object, which provides all the information needed for an initial guess. The code would also use scatterer to determine the scattering theory. It would use a standard least-squares fitting algorithm that reports the best fit and uncertainty on the attributes of the scatterer. The user would write, for example

hp.inference.least_squares_fit(Sphere(n=1.59, r=0.5, center=[5, 5, 15]), data, noisesd=2.0)

This format is useful for interactive sessions because one can easily change the initial guess or the shape of the scatterer to see how the fit changes.

But we should first figure out what use cases to prioritize.

barkls commented 5 years ago

Thanks for the input! I agree it makes sense to figure out what use cases we want to prioritize. I'll clarify a few points about the above options that may have been unclear in my earlier post and then respond to @vnmanoharan's comments with my own opinions in a separate comment.

  1. Initial guesses determined from model - Note that the model includes HoloPy Prior objects. For Gaussian priors, it is reasonable to take an initial guess at the mean position. For uniform priors, the guess is by default in the middle of the range, but can be specified when defining the prior. This matches the behaviour in HoloPy's deprecated Fitting module, which was in turn based off lmfit Parameter objects. The point is that interpreting an initial guess from a model requires no new information to be attached to the model.

  2. Pass Method/Strategy object into functional API - This is meant to enable expert customization of the inference Method/Strategy without needing to learn a new API from examples 1, 2, 3.

barkls commented 5 years ago

My own preferred use case is option 4/5 with no preference between them. I think option 1 is nice as the simplest new user case. I'm personally not a fan of option 3 (see below).

I disagree that option 4/5 won't be used in an interactive workflow. I am quite likely to define a new object, or modify an existing one (for example, decrease tolerance if fit hasn't converged) within a jupyter notebook. I think hiding the objects behind a keyword API in option 3 loses some of the advantages of the object oriented structure we have set up in HoloPy. I want to be able to define and modify objects and then make them interact in both interactive and automated workflows.

@vnmanoharan's last examples are interesting because they remove the Model object entirely and just pass a scatterer in to the fitting algorithm. Some model would have to be assumed. Current choices are AlphaModel, PerfectLensModel, and ExactModel (which handles arbitrary scattering calculations, not necessarily holograms). It would be nice to remove this level of abstraction from users as long as one of these models makes sense as a default. Note that users would still need to parameterize the scatterer with Prior objects, or else there's nothing to fit!

vnmanoharan commented 5 years ago

You don't need priors for a least squares fit (in fact, you don't want them, since they will be ignored). The reason option 4 doesn't have any advantage for an interactive workflow is that it mixes an object-oriented and functional API. If you want to work with objects, you would just do option 5. Option 4 combines the two approaches in a way that isn't synergistic.

Whether one uses option 5 in a Jupyter notebook or not isn't the question. We're not going to stop anyone from using objects in a Jupyter notebook. The question is what's useful for exploratory analysis of the data. For that purpose, a functional API has a big advantage for many users: it can choose sensible defaults for the optimization/sampling algorithms based on the model and the data, because it knows all of those things at once. If you want to do that by setting up objects, it's trickier -- your Method object doesn't know about the model or data when you set it up. Of course, you could have your MCMCmethod.sample() or PointEstimateMethod.estimate() method figure this out for you, but then you're not using the attributes of the object when you instantiate it.

vnmanoharan commented 5 years ago

Here is a proposal. At the top of the thread, I mentioned the three different inference goals:

  1. Obtain a point estimate representing the maximum of the posterior probability.
  2. Obtain a point estimate and covariance matrix from minimizing chi-squared subject to bounds (least-squares fit).
  3. Obtain an estimate of the shape of the posterior PDF. This is what MCMC sampling does. We can also obtain a point estimate representing the "best fit" but this is a separate step.

Then I mentioned the various use cases. Here is the proposed API

barkls commented 5 years ago

I disagree that it is desirable for new users to learn so many different APIs but as long as the underlying objects are intercompatible then I am happy.

Here's my understanding of @vnmanoharan's proposal:

I do have a few clarifying questions before I can implement, but I want to make sure we are in agreement about the target API first.

vnmanoharan commented 5 years ago

There are two APIs being proposed: a functional one, which inspects the data and model, and an object-oriented one. When you say that there are "so many different APIs" are you proposing a single API? What would it look like?

barkls commented 5 years ago

I don't like that the different functions have different APIs when really they all need the same information: data, generative model, which parameters to vary, parameter ranges and initial guesses, and noise level. Currently, information about parameters is included in the model as Prior objects and noise level can attached to either the model or the data.

The single API is given in my earlier comment with options 1-4. There would be two functions named point_estimate and sample_posterior or similar, but their arguments would mostly mirror each other. Novice users would not pass in any more than minimal arguments (model + data). Experienced users would pass in a customized Strategy/Method object, but with the same functional API so that new users only learn a single way to do calculations.

vnmanoharan commented 5 years ago

The functions have different information because they are doing different things. This is why this issue was filed. Least-squares fitting defines a generative model that is not the same in general as the one used in posterior maximization. And MCMC does a different task than posterior estimation. So I don't think it's good to have "new users only learn a single way to do calculations," since there are three different types of calculations, which make different assumptions and return different things.

barkls commented 5 years ago

I think we have different priorities here, which is why neither of us are able to suggest solutions that address each other's concerns. I can implement whatever API you would like; I just need to understand what that is.

vnmanoharan commented 5 years ago

The API is specified in the proposal above. Here is what I would suggest: start with the documentation and rewrite the code examples so that they use the proposed API (in a new branch). That will tell us how the proposed API should be revised or modified. Once we have settled on that, we can proceed to implementation.

barkls commented 5 years ago

I've taken a first pass at documenting the proposed API. It's available on readthedocs here. Many of the hyperlinks don't work because the namespaces they refer to don't exist yet. Others do exist but have no documentation.

Let me know what you think - once everyone is happy with the documentation I can get to work on implementing the actual code.

barkls commented 5 years ago

Has anyone had a chance to look at the documentation for the proposed API yet? @vnmanoharan @briandleahy @ralex0 I'd appreciate some feedback so we can move forward on implementing this.

briandleahy commented 5 years ago

Hey @barkls sorry I didn't get a chance to write this down earlier. Right now it looks pretty good to me, but I have a few questions / comments:

Finally, we can adjust the parameters of the sphere in order to get a good fit to the data. Here we adjust the center coordinates (x, y, z) of the sphere and its radius, but hold its refractive index fixed.

ls_sphere = least_squares_fit(s, data_holo, parameters = ['x','y','z','r'])
ls_hologram = calc_holo(data_holo, ls_sphere)

I like how simple this is, but how would least_squares_fit know how to calculate a hologram? What model would it use? As we start to proliferate more theories (mielens vs mie vs dda, for example), I'm worried that the defaults for least_squares_fit might end up accidentally differing from the defaults for calc_holo, which could cause some confusion. I guess I'm just saying we need a consistent way to set up default scattering models if that's the case.

Customizing the model

fit_result = point_estimate(model, data_holo)
  1. Function names should be verbs; I would say rename this to measure_point_estimate or something.
  2. You've used point_estimate here which is different from least_squares_fit. I'm assuming both would be supported, and (perhaps) point_estimate would call least_squares_fit as one of the ways to find a point estimate? Is that how this would work? If so, then why not just have point_estimate(method='least_squares_fit') be user facing?

The sample_posterior() calculation returns a SamplingResult object...

There is no code here, so I can't comment on the proposed interface :). But what there is in the text seems reasonable to me.

Customizing the algorithm ...

cma_fit_strategy = CmaStrategy(popsize=15, parallel=None)
cma_fit_strategy.seed = 1234
hp.save('strategy_file.h5', cma_fit_strategy)
strategy_result = cma_fit_strategy.fit(model, data_holo)
  1. I would suggest naming cma_fit_strategy.fit to something consistent across other algorithms (it looks like it's called point_estimate above; I'd suggest renaming both to calculate_point_estimate or something).
  2. hp.save('strategy_file.h5', cma_fit_strategy): What is the logic of saving the fit strategy, and why would it be saved as an h5 file? If it's just saving the seed, population size, and whether or not it's parallel, I think a human-readable plain text format would be better.
ralex0 commented 5 years ago

I don't really like the names chosen for the functions that do the work. Following @briandleahy mentioned above, is there a way we can rename the functions point_estimate, least_squares_fit, to make it completely obvious what they do? "Point estimate" sounds like an object to me. Also, I don't think "scipy" is a should be the name of the "method" for least squares fitting. I think point_estimate should be able to accept any optimization method provided by scipy, and cma-es.

Also, on the subject of objects, what about the object oriented API? What's the InferenceStrategy class look like? I think it's unlikely that I would actually use the functional API since in my experience there isn't a great one size fits all set of parameters for any inference strategy.

vnmanoharan commented 5 years ago

Thanks for putting this together, @barkls. Questions/comments:

  1. Should s in the line ls_sphere = least_squares_fit(s, data_holo, parameters = ['x','y','z','r']) read guess_sphere instead?
  2. Regarding @briandleahy's comment about defaults: we should use the same defaults for both calc_holo() and the function that does fitting/optimization.
  3. Regarding @briandleahy's comments 1 and 2: Least-squares fitting assumes a posterior by default, whereas the point_estimate() function expects to be given one. That's why I don't like point_estimate(method='least_squares_fit'). That said, one option that would also satisfy @briandleahy's and @ralex0's concern about naming would be to call both functions find_MAP(). If no statistical model is passed to the function, find_MAP() would use a least-squares approach. Otherwise, it would try to maximize the posterior that is given, using whatever optimization algorithm the user specifies. This approach has a few positives:
    1. find_MAP is a verb
    2. find_MAP accurately describes what the function is doing, which is finding the maximum of the posterior (point_estimate had the additional disadvantage that it's not clear what point estimate it is trying to find: MAP or mean or something else)
    3. With the defaults above, there wouldn't be a conflict between arguments, as might happen if you did point_estimate(method='least_squares_fit', model, data). In this example, the user-specified posterior would be thrown out.
    4. find_MAP is the same name as the function in PyMC3 and can share the same syntax. In particular, find_MAP in PyMC3 can accept any optimization method in scipy.optimize. Our find_MAP would also have the same caveats as the function in PyMC3: namely, not giving any info about the uncertainty.
barkls commented 5 years ago

Thanks for the feedback everyone. Please let me know if I missed or misinterpreted any of your points. Feel free to add more letters if appropriate.

A) @briandleahy asks how least_squares fit knows how to calculate a hologram and worries default might be different from calc_holo. We would need to enforce a default model. I was planning to use the current ExactModel for a hologram. One way to enforce the same behaviour with calc_holo would be to have calc_holo and least_squares_fit share code behind the scenes to instantiate a Model class and execute it's forward function.

B) @briandleahy says function names should be verbs. Does @vnmanoharan's suggestion of find_MAP work for you?

C) @briandleahy asks what the differences are between least_squares_fit and point_estimate. My goal was that least_squares_fit was more for novice users - specifically it doesn't require specification of a Model object. The idea is to use point_estimate for finer control over parameters and fitting algorithms, and it could recreate the behaviour of least_squares_fit by choosing the correct set of values. See point L) for more on this topic.

D) @briandleahy asks for more details about the SamplingResult object. My idea is that this would be similar to the current (3.3) version, but I think there's a separate discussion to be had about the results objects. I've been intentionally vague here so we can work out the API for doing a calculation before figuring out how to interact with the result. I think there is sufficient compartmentalization to treat the Results objects as a separate concern for now but let me know if you (or anyone else) disagrees.

E) @briandleahy thinks the Strategy method name should match the functional interface name. I agree, so we would have cma_fit_strategy.find_MAP() or similar.

F) @briandleahy wonders why we would save a Strategy object (which is mostly a container for hyperparameters attached to methods for implementing a fitting algorithm) as a .h5 file since it doesn't have any data. HoloPy used to save everything as .yaml; now it saves everything as .h5. That was a decision made to simplify io but we could (should?) support multiple file types for different types of objects. The yaml-reading and writing code is still very much in HoloPy so that would be easy to implement, but I feel like it's not immediately relevant here. Feel free to open a new issue.

G) @ralex0 also dislikes names. see points B) and E).

H) @ralex0 would like to be able to use any scipy optimization method. I agree that would be desirable but I don't think it's implemented yet in HoloPy. I was writing this based on the current state but you're right that we should think about future too.

I) @ralex0 wonders more about the object-oriented API. I did mention it briefly in the section "Customizing the algorithm", but I agree it didn't get as much explanation as it should since that's the way we will be doing any of these calculations. Right now each method (Nmpfit, scipyfit, Emceebasic, TemperedSampling, and CMA-ES) has its own strategy and they generally aren't inherited from a baseclass. I think we'd split strategies two categories - those with find_MAP method and those with sample_posterior method and implement some sort of inheritance structure. I agree we shouldn't try to jam all the different algorithms into a single object because they take different arguments.

J) @vnmanoharan found a typo where the variable guess_sphere changed to s. I'll fix that in the next version.

K) @vnmanoharan confirmed we should use the same defaults for calc_holo and least_squares_fit since neither uses a model. See point A).

L) @vnmanoharan suggests combining least_squares_fit and point_estimate into a single function find_MAP. I saw the difference between the two functions being the absence of a Model object with parameters specified as Prior objects, which removed a level of abstraction that may be confusing for new users. However, I'm not sure this separation is necessary so I'd be ok with combining them and requiring users to pass in a Model object and Prior/Parameter objects (which is the case in both master and 3.3 branches currently).What does everyone else think?

M) @vnmanoharan suggests matching the syntax of PyMC3 for the find_MAP function. I think PyMC3 includes the data as part of the Model but we do not (and I don't think we should) so we couldn't match their syntax exactly, but we could certainly move in that direction.

barkls commented 5 years ago

Hi @vnmanoharan @ralex0 @briandleahy I have taken another pass at the updated documentation based on your feedback, with the latest version available on readthedocs. I'd appreciate it if you can each take a look and let me know what you think of this most recent version. Thank you!

briandleahy commented 5 years ago

Hey @barkls thanks for this. I took a look and I have a few comments (+ replies to your A-F above):

But overall I like what you have in the documentation now. I'd only say the points in (A) and (B) above, otherwise it looks great to me.

vnmanoharan commented 5 years ago

@barkls, all looks good up to the "Customizing the algorithms" section. There I have some semantic concerns.

1) I don't think of find_MAP() as a method of a Strategy. A strategy is an algorithm + tuning parameters. A MAP is a property of a posterior distribution or statistical model. So it's confusing to me to read

cma_fit_strategy.find_MAP(data_holo, model)

since the Strategy itself doesn't have a MAP. It would make more sense to me to have

model.find_MAP(data_holo, strategy)

but that might require changing the Model class.

2) It seems as if you're saying that both EmceeStrategy and CMAStrategy are InferenceStrategy objects. This doesn't make sense to me, because while they are both strategies, they are strategies to different ends. Since you wouldn't use EmceeStrategy to find the MAP, and you wouldn't use CMAStrategy to sample, they shouldn't be inherited from the same base class. You could instead have a SamplingStrategy class and a FitStrategy class.

3) If you can address point 2 and point 1 above, I would be OK with renaming find_MAP() to just fit(). That would likely make @briandleahy and @ralex0 happy. Then we would have fit() and sample(), as @barkls proposed above. That should make things simpler. But I am against having these be methods of a Strategy object.

So the advanced API would look something like this:

Fitting

fit_result = model.fit(data_holo, fit_strategy)

Read this as "Fit model to data_holo using this FitStrategy, returning a FitResult."

Sampling

sampling_result = model.sample(data_holo, sampling_strategy)

Read this as "Sample parameters of model from data_holo using this SamplingStrategy and returning a SamplingResult."

Would this advanced API meet your needs?

barkls commented 5 years ago

I've posted another update to address @vnmanoharan's comments. Specifically, I've changed the function and method names find_MAP --> fit and sample_posterior --> sample. I've also moved the methods from the Strategy objects to Model objects. Finally, I've removed any reference to an InferenceStrategy baseclass. Let me know what you all think, but I feel like we're pretty close to something everyone is happy with!

I didn't change the default Model used by the fit function when called with a Scatterer object from ExactModel to AlphaModel. I feel like that's pretty opaque since the user in that scenario never passes in a guess value for alpha or might not even know of its existence. I would prefer for the decision to use AlphaModel to be more deliberate by instantiating a model with a prior (or fixed value) for alpha. Note that calling fit with the default model and specifying a list of parameters to vary from an initial guess scatterer is intentionally limiting - it's meant to be a simple way for new users to get started by building off calc_holo, but anyone doing serious inference will want to specify their own Model, and probbaly tweak a strategy as well. However, this is just my opinion, and I would appreciate hearing from others @vnmanoharan @ralex0 about this as well.

briandleahy commented 5 years ago

I'm fine with the name changes.

Regarding the AlphaModel -- I still think that the default behavior should be the one that "just works", which would favor the AlphaModel. We could have it default instantiate an alpha with a flat prior on [0, 1], since that is what both Grier's group and we do. But I'm happy to defer to you and/or others if you disagree.

ralex0 commented 5 years ago

The API looks mostly fine to me, but what is this reference to point_estimate?

The sample() calculation returns a SamplingResult object, which is similar to the FitResult returned by point_estimate()

Should "point_estimate" be named "fit" now, or is it still something else entirely?

barkls commented 4 years ago

Thanks for the feedback you guys. I'll make both of those changes. Other than that, is everyone happy with the current version? @vnmanoharan have you had a chance to look at it?

Once everyone signs off we can work on implementation and finalize the 3.3 release.

barkls commented 4 years ago

305 finalized the docs changes and corresponding code implementation.