Acceptance Criterion - Githubissues

I am copying an email correspondence I had with @jchodera a while ago:

Me:

As I interpret this equation, p(theta_j | D)/p(theta_i | D) is just the posterior distribution ratio that we normally compute with Metropolis-Hastings MCMC. The P(M_j) / P(M_i) is the prior knowledge ratio for the two models. If I assume that the prior knowledge suggests that they are equally probable this ratio will just be 1. The P(M_i | M_j) / P(M_i | M_j) is the ratio of "jump from the current model" at "pre-specified probabilities". Currently, in my code every MC step has a 50:50 probability of a model swap, therefore this ratio should be 1. The final term (T_ji/T_ij) is the one I am not as sure about. In other texts I have seen this ratio expressed as a single term, namely, the determinant of the Jacobian matrix. Currently, I am using a value of 1, but maybe this is my problem. In my specific example, I assumed that the Jacobian would be 1 because the parameters remain unchanged when going from one model to the next (since in one model the second parameter is simply ignored). However, I have come across some work where they introduce an auxiliary variable into the single parameter model so that they both have 2 parameters. Unfortunately, it is not clear to me how I would do this for my specific example.

John:

Apologies if the notation here was confusing! I realize now this is much more difficult to understand than I had hoped.

Patrick and I just worked out a more detailed version of the acceptance criteria. See the attached photo.

The likelihood ratio and prior ratio give you your posterior ratio, as you describe.

If you are not changing your parameters or putting them through a functional transformation as you change models, then the parameter proposal ratio is unity, as you describe.

And if your model jump proposal probability is symmetric, the model jump proposal probability is unit, as you describe.

Me:

Essentially, with all the new terms being equal to 1, my acceptance criterion is still just the ratio of posterior distributions. However, I am not sure if the posterior for the single parameter model should have a single prior (just for epsilon) while the two parameter model should have two priors (for both epsilon and sigma). Although it would make sense to only use 1 prior for the single parameter model, this results in an acceptance ratio of 1 for every model swap move when going from 1-parameter to 2-parameter model. So currently I am including two priors even for the single parameter model. Although this does not seem appropriate, this does result in the 1-parameter model being favored over the 2-parameter model.

John:

Going between models of different dimension is trickier. Patrick can point to some good references here.

The simplest approach in going from an [epsilon] to [epsilon,sigma] space is to simply keep the sigma variable around and use a separable prior on epsilon and sigma:

p(epsilon | M1) = p(epsilon | M2) p(sigma | M1) = p(sigma | M2)

where you would just have one model (M1) be independent of sigma and the other model (M2) depend on both epsilon and sigma. This is essentially how the implementation in pymc would work, with a model index integer selecting which likelihood model you use, but the priors being the same for all models.

Me:

Here are my questions:

How should I determine T_ji(theta_i | theta_j) when the only difference between the models is the inclusion of an addition parameter? Do I need to use some sort of auxiliary parameter so they both have 2 parameters?

John:

If you are keeping the same number of parameters in each model and not changing the parameters when proposing a model change, this remains unity. If you are changing the parameters but keeping the same number, this is not necessarily unity. If it is a deterministic transformation, the Jacobian is relevant. If it is stochastic, the probability distribution is relevant. If you are changing the number of parameters, this gets more complicated.

Me:

Is proposing a model swap every step with equal probability a good approach?

John:

It depends on how many models you have. Optimizing this can in principle get you higher acceptance rates, but remember that increasing the probability of proposing a model k will also mean that the model jump proposal probability will oppose transtiions to that model.

Me:

Does the posterior distribution for the single parameter model only include a prior for epsilon while the two parameter model has a prior for both epsilon and sigma?

John:

If you are keeping sigma around in both models as an auxiliary variable in the model where it isn't relevant, and the prior for both parameters is the same in both models, you need to include the prior contribution in both models. The auxiliary variable will integrate out to be irrelevant in the evidence for the model in which it is not used.

Me:

Is there a simple way to implement this in PyMC? All I have found online for RJMC with PyMC is a GitHub code that is not completely finished.

John:

You can see an example that Patrick has coded up for GB forcefield parameterization here:

https://github.com/choderalab/gbff/blob/master/model.py#L280-L321

The model choice is a categorial variable. All parameters have the same prior and are sampled in all models. Only some parameters actually matter in computing the likelihood for each model, so they are free to float (within the prior distribution) if they don't appear in the likelihood.

Me:

Let me know if I need to clarify my questions and issues more.

I have been looking for a good resource for RJMC. The original article written by Green et al. in 1995 (Ref [54] in the NSF proposal) is rather difficult to digest. I would be very interested if you have any other resources to walk me through the algorithm.

John:

Hopefully Patrick can point you in the right direction!

ramess101 / RJMC_LJ_Ethane

Acceptance Criterion #1