Add structure for likelihood to compute predictive distributions based on draws

MansMeg commented 4 years ago

In model slot.

eerolinna commented 4 years ago

Would this be an additional slot in the model info file like blr.info.json or in a model code file like blr.stan?

MansMeg commented 4 years ago

Im actually starting to think of this as a separate stan function for now (a separate stan file). Then we can add it along the way.

eerolinna commented 4 years ago

So you're thinking that blr.info.json would be something like this?

{
  "name": "blr",
  "keywords": [],
  "title": "A Bayesian linear regression model with vague priors",
  "description": "A Bayesian linear regression model with vague priors.",
  "urls": [],
  "model_code": {
    "stan": "models/stan/blr.stan"
  },
  "likelihood_code": {
    "stan": "models/stan/blr_likelihood.stan" 
  },
  "references": null,
  "added_date": "2019-11-29",
  "added_by": "Mans Magnusson"
}

MansMeg commented 4 years ago

Yes. Exactly!

eerolinna commented 4 years ago

With PyMC there is no need for any extra code to compute predictive distributions

posterior_predictive_samples = pymc.sample_posterior_predictive(draws, model=model)

where model is any pymc model and draws are the posterior draws computed with pymc. Source: https://docs.pymc.io/notebooks/posterior_predictive.html

It seems backwards to include likelihood_code in the model info file when not all model implementations will require it.

Ideal solution

From my perspective the ideal solution seems like this

change blr.info.json to this
```
{
"name": "blr",
"model_implementations": {
"stan": "models/stan/blr.info.json",
"pymc": "models/pymc/blr.info.json"
}
}
```
(some slots like keywords have been omitted for the sake of clarity, but they still would remain here)

models/stan/blr.info.json would be

{
"code": "models/stan/blr.stan",
"likelihood": "models/stan/blr_likelihood.stan"
}

models/pymc/blr.info.json would be
```
{
"code": "models/pymc/blr.py"
}
```

MansMeg commented 4 years ago

I agree that this is probably the right way to go. Well spotted.

eerolinna commented 4 years ago

How should we expose the likelihood/predictive distribution to users?

In other words, if I have a posterior object po <- posterior("8_schools"), what would the API be like to access the likelihood/predictive distribution?

Would it be something like stan_predictive_draws(po, posterior_draws)? Would we also want something like stan_likelihood(po, posterior_draws)? What would this return?

These could be generalized to predictive_draws(po, posterior_draws, framework = "stan") and the same for stan_likelihood.

MansMeg commented 4 years ago

Yes, that would probably be a good idea

MansMeg commented 4 years ago

Write a suggestion for the 8-schools example of how it should look and review it with Paul.

The structure change suggested by @eerolinna needs to be done before @jarnefeltoliver starts to fill out the db.

MansMeg commented 4 years ago

I have now implemented it, but I think it is simpler to just add a JSON object directly:

{
  "name": "blr",
  "model_implementations": {
    "stan": {
         "model_code": "models/stan/blr.stan",
         "likelihood_code": "models/stan/blr_likelihood.stan"}
    "pymc": {
          "model_code": "models/pymc/blr.py"
     }
  }
}

eerolinna commented 4 years ago

That's a good idea!

eerolinna commented 4 years ago

Do we want to have an API that exposes the likelihood code? Something like

stan_likelihood_code_file(po)

that would for the blr posterior return models/stan/blr_likelihood.stan.

Or is it sufficient that we have something like

posterior_predictive_draws(po, framework = "stan")

that essentially keeps the likelihood code file as an implementation detail instead of public API?

Arguments for keeping the likelihood code file out of the public API

Smaller API surface is easier to learn, understand and maintain
If Stan ever adds a way to automatically compute predictive draws without a separate likelihood definition we can remove the likelihood code files without breaking anyone's code

Arguments for adding the likelihood code to the public API

Someone might need the likelihood code outside of computing predictive draws?

MansMeg commented 4 years ago

We would like to be able to produce the predictive distribution using stan likelihood file without any restrictions on how the posterior was computed. So in R we would have a function that uses the likelihood file but returns a predictive distribution. Although this does not necessarily need to expose the likelihood code as you say. I . have no good answer yet.

stan-dev / posteriordb

Add structure for likelihood to compute predictive distributions based on draws #112

Ideal solution