RJMC Results - Githubissues

ramess101 commented 7 years ago

Currently the RJMC algorithm gives the result that the one parameter model is justified only in a 3:1 ratio. We expected that the one parameter model (just epsilon) would always be favored because we are only matching HVP (which only depends on epsilon). Most likely we have an error in our acceptance criterion when going from the two parameter model (eps/sig) to the one parameter model (eps) and viceversa.

mrshirts commented 7 years ago

Good to know. Some combination of Josh, Bryce, and myself can take a look at this on the back-burner.

maxentile commented 7 years ago

I can take a look at this! The acceptance criterion to check is here https://github.com/ramess101/RJMC-LJ-HVP/blob/master/RJMC_LJ_HVP.py#L142-L160 ?

ramess101 commented 7 years ago

Great! I admit that I have not looked at this in a few months and did not document as well as I should have. That being said, it's a pretty simple problem/code so just let me know what questions you have as to the coding and implementation.

The acceptance criterion to check is here https://github.com/ramess101/RJMC-LJ-HVP/blob/master/RJMC_LJ_HVP.py#L142-L160 ?

Yes. I would also take a look at lines 116-132 since it could be a bookkeeping problem with how I am updating the Markov Chain. I am also completely open to using pymc. From what I could tell, pymc does not have a clear RJMC interface which is why I wrote my own code.

ramess101 commented 6 years ago

I am copying an email correspondence I had with @jchodera a while ago:

Me:

As I interpret this equation, p(theta_j | D)/p(theta_i | D) is just the posterior distribution ratio that we normally compute with Metropolis-Hastings MCMC. The P(M_j) / P(M_i) is the prior knowledge ratio for the two models. If I assume that the prior knowledge suggests that they are equally probable this ratio will just be 1. The P(M_i | M_j) / P(M_i | M_j) is the ratio of "jump from the current model" at "pre-specified probabilities". Currently, in my code every MC step has a 50:50 probability of a model swap, therefore this ratio should be 1. The final term (T_ji/T_ij) is the one I am not as sure about. In other texts I have seen this ratio expressed as a single term, namely, the determinant of the Jacobian matrix. Currently, I am using a value of 1, but maybe this is my problem. In my specific example, I assumed that the Jacobian would be 1 because the parameters remain unchanged when going from one model to the next (since in one model the second parameter is simply ignored). However, I have come across some work where they introduce an auxiliary variable into the single parameter model so that they both have 2 parameters. Unfortunately, it is not clear to me how I would do this for my specific example.

John:

Apologies if the notation here was confusing! I realize now this is much more difficult to understand than I had hoped.

Patrick and I just worked out a more detailed version of the acceptance criteria. See the attached photo.

The likelihood ratio and prior ratio give you your posterior ratio, as you describe.

If you are not changing your parameters or putting them through a functional transformation as you change models, then the parameter proposal ratio is unity, as you describe.

And if your model jump proposal probability is symmetric, the model jump proposal probability is unit, as you describe.

Me:

Essentially, with all the new terms being equal to 1, my acceptance criterion is still just the ratio of posterior distributions. However, I am not sure if the posterior for the single parameter model should have a single prior (just for epsilon) while the two parameter model should have two priors (for both epsilon and sigma). Although it would make sense to only use 1 prior for the single parameter model, this results in an acceptance ratio of 1 for every model swap move when going from 1-parameter to 2-parameter model. So currently I am including two priors even for the single parameter model. Although this does not seem appropriate, this does result in the 1-parameter model being favored over the 2-parameter model.

John:

Going between models of different dimension is trickier. Patrick can point to some good references here.

The simplest approach in going from an [epsilon] to [epsilon,sigma] space is to simply keep the sigma variable around and use a separable prior on epsilon and sigma:

p(epsilon | M1) = p(epsilon | M2) p(sigma | M1) = p(sigma | M2)

where you would just have one model (M1) be independent of sigma and the other model (M2) depend on both epsilon and sigma. This is essentially how the implementation in pymc would work, with a model index integer selecting which likelihood model you use, but the priors being the same for all models.

Me:

Here are my questions:

How should I determine T_ji(theta_i | theta_j) when the only difference between the models is the inclusion of an addition parameter? Do I need to use some sort of auxiliary parameter so they both have 2 parameters?

John:

If you are keeping the same number of parameters in each model and not changing the parameters when proposing a model change, this remains unity. If you are changing the parameters but keeping the same number, this is not necessarily unity. If it is a deterministic transformation, the Jacobian is relevant. If it is stochastic, the probability distribution is relevant. If you are changing the number of parameters, this gets more complicated.

Me:

Is proposing a model swap every step with equal probability a good approach?

John:

It depends on how many models you have. Optimizing this can in principle get you higher acceptance rates, but remember that increasing the probability of proposing a model k will also mean that the model jump proposal probability will oppose transtiions to that model.

Me:

Does the posterior distribution for the single parameter model only include a prior for epsilon while the two parameter model has a prior for both epsilon and sigma?

John:

If you are keeping sigma around in both models as an auxiliary variable in the model where it isn't relevant, and the prior for both parameters is the same in both models, you need to include the prior contribution in both models. The auxiliary variable will integrate out to be irrelevant in the evidence for the model in which it is not used.

Me:

Is there a simple way to implement this in PyMC? All I have found online for RJMC with PyMC is a GitHub code that is not completely finished.

John:

You can see an example that Patrick has coded up for GB forcefield parameterization here:

https://github.com/choderalab/gbff/blob/master/model.py#L280-L321

The model choice is a categorial variable. All parameters have the same prior and are sampled in all models. Only some parameters actually matter in computing the likelihood for each model, so they are free to float (within the prior distribution) if they don't appear in the likelihood.

Me:

Let me know if I need to clarify my questions and issues more.

I have been looking for a good resource for RJMC. The original article written by Green et al. in 1995 (Ref [54] in the NSF proposal) is rather difficult to digest. I would be very interested if you have any other resources to walk me through the algorithm.

John:

Hopefully Patrick can point you in the right direction!

ramess101 commented 6 years ago

Perhaps this problem is ill-posed. The likelihood is exactly the same between the two models (for a given theta, i.e. epsilon and sigma). The priors are exactly the same for any epsilon and sigma (using uniform prior on both). So essentially there is no information to determine if one model is better than the other. I believe that the 3:1 ratio is fairly meaningless and is just a result of how I coded the model swaps.

I think we need to consider a slightly more interesting problem if we want to get something meaningful out of our RJMC analysis. For example, we can fix epsilon to be that obtained from Tc and see if RJMC suggests using just sigma or sigma and epsilon to fit rhol and/or Psat.

Another option would be to use the 2-center Lennard-Jones plus quadrupole model (2CLJQ). We could compare the 2-center LJ model with the 1-center LJ model for something like ethane. Alternatively, we could compare the 2-center LJ model with or without bond-length as a fitting parameter. Or we could look at something like ethylene to see if we need the quadrupole to match rhol, Psat, etc.

In each of these cases we will need to have a Jacobian that informs how the parameters change when switching between models. For example, a change in bond-length directly impacts the feasible region of sigma and epsilon.

ramess101 / RJMC-LJ-HVP

RJMC Results #1