mathematicalmichael / jsm19

Poster presentation for Tian Yu Yen and Michael Pilosov for the Joint Statistical Meeting 2019.
Other
1 stars 0 forks source link

Discussion of Theory #5

Closed mathematicalmichael closed 5 years ago

mathematicalmichael commented 5 years ago

Find the numerical integration content to find the integral of the "evidence" term (denominator of posterior).

Think about how similar the likelihood seems to be to our solutions. Appears almost identical!

Plots:

mathematicalmichael commented 5 years ago

I added the code to perform numerical integration into the notebook (can ignore the TODO now at the bottom), and once I did so, correctly normalizing led to me realize that the likelihoods were ever so slightly more confident. But not by a lot.

@yentyu any thoughts on this general observation? if not, will close this issue since the code has been added.

mathematicalmichael commented 5 years ago

only trick was getting the re-shaping to work out correctly as a lambda expression, which required overloading my exponential decay model to handle Floats in addition to numpy arrays.

D = distributions.parametric_dist(num_observations)
        for i in range(num_observations):
            D.assign_dist('norm', dim = i,
                          kwds={'loc': observed_data[i],
                                'scale': sigma})

        evidence = scipy.integrate.quad(lambda x: D.pdf(exponential_decay_model(x)[:num_observations].reshape(1,-1)),
                                        -1, 1)[0]
        outs = exponential_decay_model(lam_mesh)
        likelihood = D.pdf(outs)
yentyu commented 5 years ago

Here are some thoughts:

mathematicalmichael commented 5 years ago
yentyu commented 5 years ago

Main Point: I am not sure any of this analysis regarding the slight differences in the likelihoods of the updated and posterior estimates of the true parameter is relevant to the story we are trying to tell for the poster. If we feel confident in the figures, I think that the key idea is that both methods result in similar / equivalent estimates. There may be a case where one method may be better than the other, but that is a topic for further research (at least w.r.t. the poster).

Responses to Other Questions: Overall, I was mostly giving thoughts on how to troubleshoot the differences you noticed, so if you are short on time and energy, I think we can ignore most of the extraneous details. For completeness:

mathematicalmichael commented 5 years ago

Thank you, that was very helpful. So, to summarize,

That sounds like a fair comparison. Much better than what we have at this point in the notebook, which amounts basically to a qualitative comparison.

mathematicalmichael commented 5 years ago

This may or may not be relevant to the story now, but it is worth mentioning that some of the discussions I've had recently with experts in using Bayesian inference with large, complicated models is that they have trouble getting MCMC convergence, and widening priors actually leads to worse performance. They like the appeal of what Youssef/Ghattas do with Hessians but that requires a specification of a prior mean in the right "neighborhood" of truth, which may come from using a surrogate or another algorithm. So if that "confident answer" that initializes another algorithm is in some sense "far off" from truth, there's a chance that the solution will actually land somewhere between truth and the initial mean. Thus, in some sense it makes sense to look at examples where the prior is confident but wrong as a comparison against our method, which would "ignore" the prior more than the standard methods do, in theory. Like I said, my theory is that with wide priors, we do about as well. The advantages may come in "shifting" an incorrect guess, perhaps one that comes from existing bayesian methods.

yentyu commented 5 years ago

For a numeric estimate of the bias, just do the same thing you do for a numeric estimate of the variance: compute MUD/MAP over multiple sample sets (of same size) and take the average of these values computed from different slices of data. Obviously, it should be less biased as the sample size (number of observations in d) increases.

The estimator is unbiased if E(estimate)=true value, where the expected value is taken over all data slices.

The Bayesian MAP estimator is known to be biased for nonlinear maps with non-uniform priors. I am sure that our method is biased as well, though how much in theory is an open question. Isn't that what you and Troy are working on for linear maps by computing the closed forms for linear maps?

My feeling is that our methods will currently suffer just as much as Bayesian inference methods when a prior is confidently wrong, but for different reasons.

Again, this is just my gut feeling. But things to keep on the back burner.

mathematicalmichael commented 5 years ago

(could not get strikethrough as part of issue title). This was a fantastic discussion and I'll close this issue for now. I love that it's archived for future reference. very helpful.