Closed vnmanoharan closed 4 years ago
The simplest fix is probably just to revert the method names of the Strategy objects from "optimize" to "fit" and "sample" as appropriate.
@barkls (looping in @ralex0, @briandleahy):
This wouldn't solve the semantic problem that optimization and MCMC are strategies to achieve two different goals. Actually there are three different inference goals:
These are three different goals. The only connection between the strategies to achieve these goals is that they all aim at inference of some sort, but only (1) and (3) operate on the same posterior PDF, and none of the strategies return the same type of information. So it doesn't make sense to lump CMA-ES, least-squares, and MCMC algorithms under the same category.
The source of this dilemma seems to be the term "fit," which can mean any of the above three things. To resolve the dilemma we have to resolve this ambiguity.
One solution is to make some of the tasks above functions, and reserve Strategy
for MCMC methods only. For example, you could define a sample()
function in the hp.inference
module. It might look something like this:
result = hp.inference.sample(model, data, strategy)
Here, model
would include the optical model, likelihood, and priors. With data
, sample()
has everything needed to calculate the posterior. strategy
specifies what MCMC method to use as well as the starting positions of the walkers.
You could also define least_squares_fit()
to be something like
result = hp.inference.least_squares_fit(model, data)
In this case the function should throw an exception if the priors are not uniform.
Finally, for optimization, you would have
result = hp.inference.optimize(model, data, method, guess)
where method
is 'CMA-ES'
or whatever method
you want to use from scipy.optimize.minimize()
. Here again model
and data
are sufficient to define the posterior.
A second approach is to make the posterior probability an object, defined by the likelihood, priors, and physical model. Then sample()
and optimize()
(or, maybe better, find_map()
) become methods of the posterior object. This approach would be more Bayesian, but if you wanted to do least-squares fitting, you would have to define a different posterior.
Here is the functional API we specified in our meeting today (and follow-up conversation with @briandleahy). Below are five different ways of interacting with the setup.
hp.inference.point_estimate(model, data)
@briandleahy suggests that the function name should be a verb, so we will want to rename from point_estimate
. This example (and all below) can also work with hp.inference.sample()
.
Model.parameters
and Model._parameters
, respectively.
hp.inference.point_estimate(model, data, initial_guess={'r':1, 'n':1.6, 'center.0':5, 'center.1':5,'center.2':5})
hp.inference.point_estimate(model, data, initial_guess=[1, 1.6, 5, 5, 5])
hp.inference.point_estimate(model, data, method="cma")
hp.inference.point_estimate(model, data, method="cma", options={})
@vnmanoharan suggests this use case shouldn't work for hp.inference.sample
.
hp.inference.point_estimate(model, data, method=hp.inference.point_estimate_methods.CMAMethod())
my_cma_method = hp.inference.point_estimate_methods.CMAMethod()
hp.inference.point_estimate(model, data, method=my_cma_method)
point_estimate
or sample
method. Note that we can choose to have these methods by hidden to enforce the functional API in examples 1-4, but they will likely call this behind the scenes anyway.
my_cma_method = hp.inference.point_estimate_methods.CMAMethod()
my_cma_method.point_estimate(model, data, initial_guess=None)
my_cma_method._point_estimate(model, data, initial_guess=None)
We also need to figure out what these functions return - probably something similar to the current v3.3 FitResult
, but probably with more hierarchy and separation between point_estimate
and sample
outputs. That can wait until after the API is established.
Thanks for the feedback @ralex0. Any thoughts @briandleahy or @vnmanoharan?
I don't see the feedback from @ralex0. In any case, I am worried about supporting too many different interfaces to the code. There are several (overlapping) use cases:
With that in mind, let's examine the 5 interfaces above in terms of their use cases:
We should probably figure out which use cases we want to prioritize. That will help us determine what APIs we want to support.
For example, if we want to prioritize non-expert, interactive use, we might encourage something like the following
hp.inference.least_squares_fit(scatterer, data, noisesd=noisesd)
where scatterer
is an instance of a Scatterer object, which provides all the information needed for an initial guess. The code would also use scatterer
to determine the scattering theory. It would use a standard least-squares fitting algorithm that reports the best fit and uncertainty on the attributes of the scatterer. The user would write, for example
hp.inference.least_squares_fit(Sphere(n=1.59, r=0.5, center=[5, 5, 15]), data, noisesd=2.0)
This format is useful for interactive sessions because one can easily change the initial guess or the shape of the scatterer to see how the fit changes.
But we should first figure out what use cases to prioritize.
Thanks for the input! I agree it makes sense to figure out what use cases we want to prioritize. I'll clarify a few points about the above options that may have been unclear in my earlier post and then respond to @vnmanoharan's comments with my own opinions in a separate comment.
Initial guesses determined from model - Note that the model includes HoloPy Prior objects. For Gaussian priors, it is reasonable to take an initial guess at the mean position. For uniform priors, the guess is by default in the middle of the range, but can be specified when defining the prior. This matches the behaviour in HoloPy's deprecated Fitting module, which was in turn based off lmfit Parameter objects. The point is that interpreting an initial guess from a model requires no new information to be attached to the model.
Pass Method/Strategy object into functional API - This is meant to enable expert customization of the inference Method/Strategy without needing to learn a new API from examples 1, 2, 3.
My own preferred use case is option 4/5 with no preference between them. I think option 1 is nice as the simplest new user case. I'm personally not a fan of option 3 (see below).
I disagree that option 4/5 won't be used in an interactive workflow. I am quite likely to define a new object, or modify an existing one (for example, decrease tolerance if fit hasn't converged) within a jupyter notebook. I think hiding the objects behind a keyword API in option 3 loses some of the advantages of the object oriented structure we have set up in HoloPy. I want to be able to define and modify objects and then make them interact in both interactive and automated workflows.
@vnmanoharan's last examples are interesting because they remove the Model object entirely and just pass a scatterer in to the fitting algorithm. Some model would have to be assumed. Current choices are AlphaModel, PerfectLensModel, and ExactModel (which handles arbitrary scattering calculations, not necessarily holograms). It would be nice to remove this level of abstraction from users as long as one of these models makes sense as a default. Note that users would still need to parameterize the scatterer with Prior objects, or else there's nothing to fit!
You don't need priors for a least squares fit (in fact, you don't want them, since they will be ignored). The reason option 4 doesn't have any advantage for an interactive workflow is that it mixes an object-oriented and functional API. If you want to work with objects, you would just do option 5. Option 4 combines the two approaches in a way that isn't synergistic.
Whether one uses option 5 in a Jupyter notebook or not isn't the question. We're not going to stop anyone from using objects in a Jupyter notebook. The question is what's useful for exploratory analysis of the data. For that purpose, a functional API has a big advantage for many users: it can choose sensible defaults for the optimization/sampling algorithms based on the model and the data, because it knows all of those things at once. If you want to do that by setting up objects, it's trickier -- your Method object doesn't know about the model or data when you set it up. Of course, you could have your MCMCmethod.sample()
or PointEstimateMethod.estimate()
method figure this out for you, but then you're not using the attributes of the object when you instantiate it.
Here is a proposal. At the top of the thread, I mentioned the three different inference goals:
Then I mentioned the various use cases. Here is the proposed API
For new users and/or exploratory use, a functional API for all three inference goals. For point estimates, it would look like option 3 above:
hp.inference.point_estimate(model, data, method="cma")
without the options
flag. The function would choose the appropriate Method if method
is not specified and would choose sensible values of the Method's parameters if method
is specified, based on inspecting the model and the data. For least-squares fitting, it would look like
hp.inference.least_squares_fit(scatterer, data, noisesd=noisesd, params=None)
I would argue that it's not correct to specify a Model object for a least-squares fit, because a least-squares fit already assumes some things about the statistical model (namely, uniform improper priors and Gaussian likelihood). Specifying the scatterer instead makes it very easy to do a least-squares fit without setting up a Model object with attributes that are going to be ignored.
For expert users and/or scripted use, the object approach in option 5 above for sampling and point estimation. For point-estimation, it would look like
my_pointestimate_method.point_estimate(model, data, initial_guess=None)
There is no need to set up a least-squares fitting framework for expert users, since experts are unlikely to use least-squares, and if they do, they can set it up in the Model object by choosing uniform priors and Gaussian likelihoods.
Behind the scenes, the functional API uses the object-oriented API, but contains some magic to make smart choices about parameters and methods.
I disagree that it is desirable for new users to learn so many different APIs but as long as the underlying objects are intercompatible then I am happy.
Here's my understanding of @vnmanoharan's proposal:
Strategy
objects.I do have a few clarifying questions before I can implement, but I want to make sure we are in agreement about the target API first.
There are two APIs being proposed: a functional one, which inspects the data and model, and an object-oriented one. When you say that there are "so many different APIs" are you proposing a single API? What would it look like?
I don't like that the different functions have different APIs when really they all need the same information: data, generative model, which parameters to vary, parameter ranges and initial guesses, and noise level. Currently, information about parameters is included in the model as Prior
objects and noise level can attached to either the model or the data.
The single API is given in my earlier comment with options 1-4. There would be two functions named point_estimate
and sample_posterior
or similar, but their arguments would mostly mirror each other. Novice users would not pass in any more than minimal arguments (model + data). Experienced users would pass in a customized Strategy/Method object, but with the same functional API so that new users only learn a single way to do calculations.
The functions have different information because they are doing different things. This is why this issue was filed. Least-squares fitting defines a generative model that is not the same in general as the one used in posterior maximization. And MCMC does a different task than posterior estimation. So I don't think it's good to have "new users only learn a single way to do calculations," since there are three different types of calculations, which make different assumptions and return different things.
I think we have different priorities here, which is why neither of us are able to suggest solutions that address each other's concerns. I can implement whatever API you would like; I just need to understand what that is.
The API is specified in the proposal above. Here is what I would suggest: start with the documentation and rewrite the code examples so that they use the proposed API (in a new branch). That will tell us how the proposed API should be revised or modified. Once we have settled on that, we can proceed to implementation.
I've taken a first pass at documenting the proposed API. It's available on readthedocs here. Many of the hyperlinks don't work because the namespaces they refer to don't exist yet. Others do exist but have no documentation.
Let me know what you think - once everyone is happy with the documentation I can get to work on implementing the actual code.
Has anyone had a chance to look at the documentation for the proposed API yet? @vnmanoharan @briandleahy @ralex0 I'd appreciate some feedback so we can move forward on implementing this.
Hey @barkls sorry I didn't get a chance to write this down earlier. Right now it looks pretty good to me, but I have a few questions / comments:
Finally, we can adjust the parameters of the sphere in order to get a good fit to the data. Here we adjust the center coordinates (x, y, z) of the sphere and its radius, but hold its refractive index fixed.
ls_sphere = least_squares_fit(s, data_holo, parameters = ['x','y','z','r']) ls_hologram = calc_holo(data_holo, ls_sphere)
I like how simple this is, but how would
least_squares_fit
know how to calculate a hologram? What model would it use? As we start to proliferate more theories (mielens vs mie vs dda, for example), I'm worried that the defaults forleast_squares_fit
might end up accidentally differing from the defaults forcalc_holo
, which could cause some confusion. I guess I'm just saying we need a consistent way to set up default scattering models if that's the case.Customizing the model
fit_result = point_estimate(model, data_holo)
- Function names should be verbs; I would say rename this to
measure_point_estimate
or something.- You've used
point_estimate
here which is different fromleast_squares_fit
. I'm assuming both would be supported, and (perhaps)point_estimate
would callleast_squares_fit
as one of the ways to find a point estimate? Is that how this would work? If so, then why not just havepoint_estimate(method='least_squares_fit')
be user facing?The sample_posterior() calculation returns a SamplingResult object...
There is no code here, so I can't comment on the proposed interface :). But what there is in the text seems reasonable to me.
Customizing the algorithm ...
cma_fit_strategy = CmaStrategy(popsize=15, parallel=None) cma_fit_strategy.seed = 1234 hp.save('strategy_file.h5', cma_fit_strategy) strategy_result = cma_fit_strategy.fit(model, data_holo)
- I would suggest naming
cma_fit_strategy.fit
to something consistent across other algorithms (it looks like it's calledpoint_estimate
above; I'd suggest renaming both tocalculate_point_estimate
or something).hp.save('strategy_file.h5', cma_fit_strategy)
: What is the logic of saving the fit strategy, and why would it be saved as an h5 file? If it's just saving the seed, population size, and whether or not it's parallel, I think a human-readable plain text format would be better.
I don't really like the names chosen for the functions that do the work. Following @briandleahy mentioned above, is there a way we can rename the functions point_estimate, least_squares_fit, to make it completely obvious what they do? "Point estimate" sounds like an object to me. Also, I don't think "scipy" is a should be the name of the "method" for least squares fitting. I think point_estimate should be able to accept any optimization method provided by scipy, and cma-es.
Also, on the subject of objects, what about the object oriented API? What's the InferenceStrategy class look like? I think it's unlikely that I would actually use the functional API since in my experience there isn't a great one size fits all set of parameters for any inference strategy.
Thanks for putting this together, @barkls. Questions/comments:
s
in the line
ls_sphere = least_squares_fit(s, data_holo, parameters = ['x','y','z','r'])
read guess_sphere
instead?calc_holo()
and the function that does fitting/optimization.point_estimate()
function expects to be given one. That's why I don't like point_estimate(method='least_squares_fit')
. That said, one option that would also satisfy @briandleahy's and @ralex0's concern about naming would be to call both functions find_MAP()
. If no statistical model is passed to the function, find_MAP()
would use a least-squares approach. Otherwise, it would try to maximize the posterior that is given, using whatever optimization algorithm the user specifies. This approach has a few positives:
find_MAP
is a verbfind_MAP
accurately describes what the function is doing, which is finding the maximum of the posterior (point_estimate
had the additional disadvantage that it's not clear what point estimate it is trying to find: MAP or mean or something else)point_estimate(method='least_squares_fit', model, data)
. In this example, the user-specified posterior would be thrown out.find_MAP
is the same name as the function in PyMC3 and can share the same syntax. In particular, find_MAP
in PyMC3 can accept any optimization method in scipy.optimize
. Our find_MAP
would also have the same caveats as the function in PyMC3: namely, not giving any info about the uncertainty.Thanks for the feedback everyone. Please let me know if I missed or misinterpreted any of your points. Feel free to add more letters if appropriate.
A) @briandleahy asks how least_squares
fit knows how to calculate a hologram and worries default might be different from calc_holo
. We would need to enforce a default model. I was planning to use the current ExactModel
for a hologram. One way to enforce the same behaviour with calc_holo
would be to have calc_holo
and least_squares_fit
share code behind the scenes to instantiate a Model class and execute it's forward function.
B) @briandleahy says function names should be verbs. Does @vnmanoharan's suggestion of find_MAP
work for you?
C) @briandleahy asks what the differences are between least_squares_fit
and point_estimate
. My goal was that least_squares_fit
was more for novice users - specifically it doesn't require specification of a Model object. The idea is to use point_estimate
for finer control over parameters and fitting algorithms, and it could recreate the behaviour of least_squares_fit
by choosing the correct set of values. See point L) for more on this topic.
D) @briandleahy asks for more details about the SamplingResult
object. My idea is that this would be similar to the current (3.3) version, but I think there's a separate discussion to be had about the results objects. I've been intentionally vague here so we can work out the API for doing a calculation before figuring out how to interact with the result. I think there is sufficient compartmentalization to treat the Results objects as a separate concern for now but let me know if you (or anyone else) disagrees.
E) @briandleahy thinks the Strategy method name should match the functional interface name. I agree, so we would have cma_fit_strategy.find_MAP()
or similar.
F) @briandleahy wonders why we would save a Strategy object (which is mostly a container for hyperparameters attached to methods for implementing a fitting algorithm) as a .h5 file since it doesn't have any data. HoloPy used to save everything as .yaml; now it saves everything as .h5. That was a decision made to simplify io but we could (should?) support multiple file types for different types of objects. The yaml-reading and writing code is still very much in HoloPy so that would be easy to implement, but I feel like it's not immediately relevant here. Feel free to open a new issue.
G) @ralex0 also dislikes names. see points B) and E).
H) @ralex0 would like to be able to use any scipy optimization method. I agree that would be desirable but I don't think it's implemented yet in HoloPy. I was writing this based on the current state but you're right that we should think about future too.
I) @ralex0 wonders more about the object-oriented API. I did mention it briefly in the section "Customizing the algorithm", but I agree it didn't get as much explanation as it should since that's the way we will be doing any of these calculations. Right now each method (Nmpfit, scipyfit, Emceebasic, TemperedSampling, and CMA-ES) has its own strategy and they generally aren't inherited from a baseclass. I think we'd split strategies two categories - those with find_MAP
method and those with sample_posterior
method and implement some sort of inheritance structure. I agree we shouldn't try to jam all the different algorithms into a single object because they take different arguments.
J) @vnmanoharan found a typo where the variable guess_sphere
changed to s
. I'll fix that in the next version.
K) @vnmanoharan confirmed we should use the same defaults for calc_holo
and least_squares_fit
since neither uses a model. See point A).
L) @vnmanoharan suggests combining least_squares_fit
and point_estimate
into a single function find_MAP
. I saw the difference between the two functions being the absence of a Model object with parameters specified as Prior objects, which removed a level of abstraction that may be confusing for new users. However, I'm not sure this separation is necessary so I'd be ok with combining them and requiring users to pass in a Model object and Prior/Parameter objects (which is the case in both master and 3.3 branches currently).What does everyone else think?
M) @vnmanoharan suggests matching the syntax of PyMC3 for the find_MAP
function. I think PyMC3 includes the data as part of the Model but we do not (and I don't think we should) so we couldn't match their syntax exactly, but we could certainly move in that direction.
Hi @vnmanoharan @ralex0 @briandleahy I have taken another pass at the updated documentation based on your feedback, with the latest version available on readthedocs. I'd appreciate it if you can each take a look and let me know what you think of this most recent version. Thank you!
Hey @barkls thanks for this. I took a look and I have a few comments (+ replies to your A-F above):
AlphaModel
since that is the practically useful one, but just my 2 cents.find_MAP
is better, although I really hate acronyms and prefer clarity over brevity, so I'd prefer find_maximum_a_posterior_estimate
or something similar. Again, just my 2 cents, not a hill I'd want to die on.But overall I like what you have in the documentation now. I'd only say the points in (A) and (B) above, otherwise it looks great to me.
@barkls, all looks good up to the "Customizing the algorithms" section. There I have some semantic concerns.
1) I don't think of find_MAP()
as a method of a Strategy. A strategy is an algorithm + tuning parameters. A MAP is a property of a posterior distribution or statistical model. So it's confusing to me to read
cma_fit_strategy.find_MAP(data_holo, model)
since the Strategy itself doesn't have a MAP. It would make more sense to me to have
model.find_MAP(data_holo, strategy)
but that might require changing the Model class.
2) It seems as if you're saying that both EmceeStrategy
and CMAStrategy
are InferenceStrategy
objects. This doesn't make sense to me, because while they are both strategies, they are strategies to different ends. Since you wouldn't use EmceeStrategy
to find the MAP, and you wouldn't use CMAStrategy
to sample, they shouldn't be inherited from the same base class. You could instead have a SamplingStrategy
class and a FitStrategy
class.
3) If you can address point 2 and point 1 above, I would be OK with renaming find_MAP()
to just fit()
. That would likely make @briandleahy and @ralex0 happy. Then we would have fit()
and sample()
, as @barkls proposed above. That should make things simpler. But I am against having these be methods of a Strategy
object.
So the advanced API would look something like this:
fit_result = model.fit(data_holo, fit_strategy)
Read this as "Fit model
to data_holo
using this FitStrategy
, returning a FitResult
."
sampling_result = model.sample(data_holo, sampling_strategy)
Read this as "Sample parameters of model
from data_holo
using this SamplingStrategy
and returning a SamplingResult
."
Would this advanced API meet your needs?
I've posted another update to address @vnmanoharan's comments. Specifically, I've changed the function and method names find_MAP
--> fit
and sample_posterior
--> sample
. I've also moved the methods from the Strategy
objects to Model
objects. Finally, I've removed any reference to an InferenceStrategy
baseclass. Let me know what you all think, but I feel like we're pretty close to something everyone is happy with!
I didn't change the default Model
used by the fit
function when called with a Scatterer
object from ExactModel
to AlphaModel
. I feel like that's pretty opaque since the user in that scenario never passes in a guess value for alpha or might not even know of its existence. I would prefer for the decision to use AlphaModel
to be more deliberate by instantiating a model with a prior (or fixed value) for alpha. Note that calling fit
with the default model and specifying a list of parameters to vary from an initial guess scatterer is intentionally limiting - it's meant to be a simple way for new users to get started by building off calc_holo
, but anyone doing serious inference will want to specify their own Model
, and probbaly tweak a strategy as well. However, this is just my opinion, and I would appreciate hearing from others @vnmanoharan @ralex0 about this as well.
I'm fine with the name changes.
Regarding the AlphaModel
-- I still think that the default behavior should be the one that "just works", which would favor the AlphaModel
. We could have it default instantiate an alpha with a flat prior on [0, 1], since that is what both Grier's group and we do. But I'm happy to defer to you and/or others if you disagree.
The API looks mostly fine to me, but what is this reference to point_estimate
?
The
sample()
calculation returns aSamplingResult
object, which is similar to theFitResult
returned bypoint_estimate()
Should "point_estimate
" be named "fit
" now, or is it still something else entirely?
Thanks for the feedback you guys. I'll make both of those changes. Other than that, is everyone happy with the current version? @vnmanoharan have you had a chance to look at it?
Once everyone signs off we can work on implementation and finalize the 3.3 release.
In 3.3, both MCMC and least-squares fitting have been rolled into
Strategy
objects, and the main method for MCMC has been renamed fromsample()
tooptimize()
. But MCMC is not an optimization method, and the type of information returned from running the sampler is not the same as the type of information returned from running an optimization algorithm. Optimization and sampling serve different purposes: optimization returns a point estimate and sampling can be used to estimate the entire posterior distribution. While both are "strategies" in some sense, they are strategies to achieve two different ends. We need to rethink how fitting should enter into the object hierarchy.