stan-dev / posteriordb

Database with posteriors of interest for Bayesian inference
176 stars 36 forks source link

Add Tensorflow probability and Pyro example for 8-schools #117

Open MansMeg opened 4 years ago

MansMeg commented 4 years ago

Test to give an additional PPF framework in the database.

MansMeg commented 4 years ago

If time, also try add Pyro, JAGS and PyMC for 8-schools (but is low prio)

eerolinna commented 4 years ago

Here's a pymc3 model

import pymc3 as pm3

def model(data):
    J = data["J"] # number of schools
    y_obs = data["y"] # estimated treatment
    sigma = data["sigma"] # std of estimated effect
    with pm3.Model() as pymc_model:

        mu = pm3.Normal('mu', mu=0, sd=5) # hyper-parameter of mean, non-informative prior
        tau = pm3.Cauchy('tau', alpha=0, beta=5) # hyper-parameter of sd 
        theta_trans = pm3.Normal('theta_trans', mu=0, sd=1, shape=J)
        theta = mu + tau * theta_trans
        y = pm3.Normal('y', mu=theta, sd=sigma, observed=y_obs)
    return pymc_model

To run it

import numpy as np
J = 8 
y = np.array([28,  8, -3,  7, -1,  1, 18, 12]) 
sigma = np.array([15, 10, 16, 11,  9, 11, 10, 18])                                
data = {"J": J, "y": y, "sigma": sigma}
m = model(data)
trace = pm3.sample(1000, model=m)

I tested that it works.

This is adapted from https://sharanry.github.io/post/eight-schools-model/ The model there had log_tau (maybe it gives better performance with pymc or perhaps it was just preference), I changed that to just tau.

I could make a PR with this.

MansMeg commented 4 years ago

Great! Two thing. 1) We want this to be a stan alone py file for the model I guess? 2) We want to be able to run it by pointing to the posteriordb json. Ideally it should be possible to run from R as well. Im not sure how to include the model/pymc in the best way?

eerolinna commented 4 years ago

My PR #118 (sorry forgot to reference this issue in it, did that now) is probably the best way to include the pymc model with the current design.

I'm not fully sure what you are asking in 2. Do you mean that will the users be able to call code <- model_code(po, framework = "pymc3") and then use the result to run the model? Or do you mean that posteriordb would have something like run_posterior(po, framework = "pymc3")?

MansMeg commented 4 years ago

Oh I was not super clear. I meant to point to the json data. Ie how would an call using the posteriordb look like?

eerolinna commented 4 years ago

Oh you mean the actual data? I'll first show how it would look like using python as R will have to go through python anyway.

import importlib
import pymc3
from posteriordb import PosteriorDatabase
my_pdb = PosteriorDatabase()
posterior = my_pdb.posterior("eight_schools")

python_code_path = posterior.model.code_file_path("pymc3")
python_lib = importlib.import_module(python_code_path)

data = posterior.data.values()

model = python_lib.model(data)
draws = pymc3.sample(model=model)

A potential way to use it from R could be

po <- posterior("eight_schools", pdb = pd)
dat <- get_data(po)
pymc_code_path <- model_code(po, "pymc3")
draws <- run_pymc(pymc_code_path, dat)

where run_pymc would essentially call a python command line script that takes model code path and data json, runs the inference and output the result as json (or something else).

Of course run_pymc(po) would be doable as well as we can get code path and data from the posterior.

I didn't comment any of the code, if you want clarifications feel free to ask!

MansMeg commented 4 years ago

Yes. This looks exactly how I would like to have it!

Although, I think run_pymc3 should probably not be part of posteriordb.

eerolinna commented 4 years ago

I agree that run_pymc3 doesn't have to be in posteriordb.

MansMeg commented 4 years ago

I will start out by including 8-schools for all PPFs:

Pyro: https://forum.pyro.ai/t/hierarchical-models-and-eight-schools-example/362

JAGS: http://rstudio-pubs-static.s3.amazonaws.com/15236_9bc0cd0966924b139c5162d7d61a2436.html

Tensorflow Probability: https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Eight_Schools.ipynb

PyMC (see Eeros stuff above).

MansMeg commented 4 years ago

Alright. When implementing JAGS Iusing run_jags() I realized that we need to automatically check that all models are equal with respect to the log density for a set of say 100 parameter values. This can be a little tricky between models implemented in R respectively Python. Especially since we want the tests to be run on travis every time.

Ideally we would like to have logp(po, params, "stan"), logp(po, params, "tfp") etc.

MansMeg commented 4 years ago

I promised to clarify the issue for @mtwest2718

So we want to include two things with this issue:

  1. Include Tensorflow code and Pyro code for the 8-schools model
  2. Check that the log density for the models is identical (or proportional) between TFP, Pyro, PyMC3 and Stan for say 100 parameter draws from the reference posterior.

Maybe also include the test in 2. as a continuous integration test. I'm not sure exactly how to do that so that is probably a separate issue since it involves including PyMC3, Pyro, TFP etc in Travis, something that might become a maintenance hell.