WIP: Implementing MCMC sampling to KLIFF

yonatank93 commented 2 years ago

We want to integrate UQ analysis into KLIFF. In this implementation, we will implement MCMC via ptemcee and emcee Python packages. We also want to add some functions for the analysis of the samples. What's included:

uq.MCMC class as a universal class to construct MCMC sampler and run the MCMC simulation.
uq.mser function that estimates the burn-in time of the MCMC simulation.
uq.autocorr function, which is a wrapper to emcee.autocorr.integrated_time() to estimate the autocorrelation length.
uq.rhat function that computes the potential scale reduction factor (PSRF) as a convergence assessment tool.

yonatank93 commented 2 years ago

@mjwen I added tests and an example for the UQ implementation. Can you take a look at it? I am also thinking of adding documentation under the "Frequently Used Modules" section, but haven't done it yet.

yonatank93 commented 2 years ago

I have reworked the examples. The next item on my list is to write the frequently used module section.

yonatank93 commented 2 years ago

I have added documentations for the uq module in the frequently used module section.

mjwen commented 2 years ago

@yonatank93 The UQ module explanation looks great, thanks! I just have a comment regarding the bounds. Other than that, the PR is ready for merge.

yonatank93 commented 2 years ago

@mjwen What question do you have about the bounds?

mjwen commented 2 years ago

@mjwen What question do you have about the bounds?

Are the bounds extremely important parameters for the MCMC algorithm to work? In other words, you set them to (-8, 8) here; what if we do not set it and use the default value of logprior_args? If this does not hugely affect the result, I think we can removing providing initial bounds here and use the default values in logprior_args. If not, the current implementation looks fine to me.

for docs/source/auto_examples/example_uq_mcmc.py

yonatank93 commented 2 years ago

Are the bounds extremely important parameters for the MCMC algorithm to work? In other words, you set them to (-8, 8) here; what if we do not set it and use the default value of logprior_args? If this does not hugely affect the result, I think we can removing providing initial bounds here and use the default values in logprior_args. If not, the current implementation looks fine to me.

The variable bounds are used twice: (1) in setting the random initial positions of the walkers and (2) in setting the boundaries of the uniform prior. For the first case, the user can actually just set his/her own random initial positions. For the second use case, specifying the bounds (via logprior_args) is actually optional. When the user doesn't specify the bounds, then it will just retrieve it from the parameters.

Suppose if we didn't define the bounds and set the initial positions to be np.random.uniform(low=-8, high=-8, ...) and use the default values of logprior_args (which is actually also [-8, 8] in this example). I feel like the choice of the lower and upper bounds in np.random.uniform seem to be arbitrary.

There is an idea that I didn't explicitly mention in the example. It is suggested that when we run MCMC, we should set the initial positions of the walkers to be spread out in parameter space. In this example, the initial positions are pretty much points sampled from the prior. And I made this choice because the region in parameter space that has non-zero posterior probability is given by the region "inside" the uniform prior. So, I kind of want to communicate an idea on how to pick the initial positions.

mjwen commented 2 years ago

Got it, thanks for the clarification! This is fine then.

openkim / kliff

WIP: Implementing MCMC sampling to KLIFF #55