hyunjimoon / SBC

https://hyunjimoon.github.io/SBC
Other
47 stars 3 forks source link

Vignette for large/complex model workflow #41

Open martinmodrak opened 2 years ago

martinmodrak commented 2 years ago

Main ideas:

hyunjimoon commented 2 years ago

I would be excited to push this forward if you think one of the following (preferably their intersection) can be considered as complex model from your point of view. Links are related issues for each module.

hyunjimoon commented 2 years ago

@Dashadower will first attack hierarchical and @hyunjimoon will attack high dimension.

hyunjimoon commented 1 year ago

Building on @martinmodrak's small workflow, @tomfid and I will make a document for Bayesian workflow on ODE model for urban dynamics (mdl file).

This satisfies at least two categories above (ODE, high dimension). Moreover, Vicky's research on urban scaling, regressing the log of city index such as congestion or income inequality with log of city population opens up the path toward hierarchical Bayesian modeling. In short, hierarchical Bayesian start from viewing $\beta$ as random variable (hence the name Bayesian regression or Random effect model) and giving a prior distribution. Then, we replace the scalar value of prior parameter to another random variable. This allows multi-level effect of learning from data, placing itself as partial pooling between no-pooling and complete pooling model. This document is a good introduction as it has both Stan code and similar model structure (log-log).

Considering $\beta$ is (averaged) elasticity at fixed time from eq.2.2. in the first paper above, Tom's writing on challenges for informative parameter setting in dynamic models is relevant. Following the excerpt from Bayesian workflow ("a pragmatic idea is to keep the priors and compute reasonable parameter values using the real data. This can be done either through rough estimates or by computing the actual posterior. We then suggest widening out the estimates slightly and using these as a prior for the SBC.") I recommend starting from a tight prior. Absence of prior knowledge or data is an elephant in the room which I aim to address with priordb project where realistic values of parameter that could be assumed are collectively learned.

We will include:

Must

  1. Classifying parameters into three: assumed parameter, assumed parameter time-series, estimated parameter.
  2. Justification of prior specification and its automation
  3. Three checks: prior predictive, posterior predictive, simulation-based calibration

For 2, sections "Multiplicative error and the lognormal distribution, Weakly informative priors, Priors for system parameters and noise scale" from this case study on population dynamics is a good place to start for setting distribution and parameter for prior. This corresponds to "Specify_implicit" (H5.abc) from this Human-Machine collaboration table (HMC table).

We are consider including:

Option

  1. Demand prior elicitation to policy function and its optimization for policy prescription
  2. Comparing different posterior approximator modules (MCMC, variational inference, optimization)

For 4, translating Vensim's .vpd to Stan model block is the key as then we can use its optimization engine like this restaurant revenue optimization example. For 5, the aim is to find the cheaper(-est) computation that reaches conditioned precision (step 9 from HMC table).

Tool

Ref

hyunjimoon commented 1 year ago

@jandraor and I am trying this with three example models in https://github.com/Data4DM/BayesSD/discussions/76. @tomfid's help, especially regarding inferencedata is helpful as vensim supporting this format would be crucial in connecting Vensim subscript with hierarchical Bayesian.

Also, @OriolAbril and @ahartikainen are helping connecting this to arviz. Thanks!

hyunjimoon commented 1 year ago

@martinmodrak @Dashadower, could current SBC R library's output be easily transformed to inferencedata by any chance? Or would there be any reference codes we can refer to e.g. previous attempts of our community to connect posterior and arviz? @Jandraor and I are using different language (R, Python) and wondered whether we can pool our efforts in plots by having a modularized data structure.

ahartikainen commented 1 year ago

cc @mike-lawrence

OriolAbril commented 1 year ago

this issue https://github.com/stan-dev/posterior/issues/85 sounds relevant to interoperability

hyunjimoon commented 1 year ago

Below is rough plan which I felt needed for large model workflow. Enjoyable milestone is Bayesian workflow dynamic model casestudy on prey-predator, SEIR, inventory management by around March, 2023. Thank you very much, all!

Goal: bridging Vensim ecosystem with Stan ecosystem

image

  1. provide efficient (gradient-based) and effective (diagnostics) HMC-estimator to dynamic model (generator)
  2. consistency among data(.nc), model (.stan), plots (.png)
  3. template for simulation-based calibration checks e.g. this python file

For this, I am trying to

  1. connect stanify with Dynamic simulation scenario 1,2 (with @tomfid, @enekomartinmartinez, @tseyanglim, @JamesPHoughton's support)

  2. 1's result by putting many .nc files into one sbc.nc (with @Dashadower, @OriolAbril, @ahartikainen's support)

  3. connect .nc output with SBC package via rvar concept (with @paul-buerkner, @martinmodrak, @jandraor's support)

Dynamic simulation scenario (Vensim)

outputs netcdf format (.nc). Scenarios to reach .nc.

scenario 1) Vensim/Stella user on Python

scenario 2) Pure Vensim user

stanify(scenario 1 or 2)

  1. stanify translates .mdl to .stan and outputs one generator.nc and estimator.nc for baseline case (no hierarchy, no prior_draw's')
  2. stanify outputs one generator.nc and three (n_prior_draws) number of estimator.nc for SBC
  3. stanify outputs one generator.nc and two (n_subgroups) number of estimator.nc for hierarchical model

Computational statistician (Stan)

transform .nc to rvars which SBC package supports. Three verifications needed:

mike-lawrence commented 1 year ago

The nc-to-rvars conversion is done by the class_composition R6 class here, specifically the nc_to_rvar method here