hyunjimoon / DataInDM

Workflow iteration starting from 15.879
3 stars 0 forks source link

Measurement model #19

Open jandraor opened 2 years ago

jandraor commented 2 years ago

A critical component for performing inference (either from a frequentist or Bayesian perspective) is the definition of the measurement model. This component connects the System Dynamics model (X) with data (y). Let's denote the measurement model by $\pi(y | X(\theta))$, where $\pi$ is a probabilistic function & $\theta$ represents a vector of model parameters. If we fix $\theta$, we refer to the measurement model as the sampling distribution. From this distribution, we can obtain simulated measurements. If we fix y, we refer to the measurement model as the likelihood function which acts as a device to determine whether a measurement is close to its expected value.

Since the parameterisation of the measurement model is not unique given the various choices of probabilistic distributions, users are required to input the appropriate probabilistic function for the System Dynamics model. The challenge from the developers' point of view is how to make this process as seamlessly as possible. The purpose of this issue is to identify & discuss user-friendly approaches to get the specification of measurement models, & further default choices. For instance, the specification of the negative binomial distribution entails an additional parameter (phi).

A proposal for this specification is the requirement for users to employ Stan language: "y ~ neg_binomial_2(stock_name, phi)"

hyunjimoon commented 2 years ago

May I say the key work for this issue for automation is to create a map between keywords and actual distribution? e.g.

1.a no bound, continuous : normal 1.b no bound, discrete: ? (perhaps no practical example)

2.a only lower bound (lbd), continuous : lbd + gamma, log-normal 2.b only lower bound (lbd), discrete : lbd + negative binomial

3.ab only upper bound (ubd): -1 * only lower bound

4.a both lower, upper bound: lbd + (ubd-lbd) * Beta( , ) vs 4.b 4.b both lower, upper bound, mode: N(mode, ((ubd+lbd)/6)^2)

Q1. Which classification standards might I be missing other than bounds and types? Q2. which distribution should we set default for 3.a and 4.a?

Let me note that the above affects model in two separate ways: hard constraints in parameter block (which goes through logit transform) and weak constraint in model block as an added value of regularization. Also, this map will correspond to step5,6 below. Pinging @Dashadower as he is working on this (in pyhon).

Step Goal Machine's work Human's work Bayes Component (Stan block) output format
1 Prior specification of Structure Vensim assists H1.a() H1.a Translate mental model to SD model (behavioral classification) Prior (function) .mdl
2 Prior specification of Demand Vensim assists H2.a() H2.a List policy functions .vpd
3 Prior specification of Data Variation PySD assists H3.abcd() .json with key: est_param, ass_param, ass_param_ts, obs_stock
- parameter variation H3.a Choose each parameters to be either est_param or ass_param Prior and constraint est_param : transformed parameter & ass_param : data)
- parameter variation H3.b Specify scalar for assmed_param
- parameter variation H3.c Specify vector_ts for assumed_param_ts
- state variation H3.d Choose each states to be obs_state or unobs_state
4 translate Manag. to Stats. lang. M1a. PySD, .build_function_block(H1.a) structure.stan
5 Specify_explicit H4a. Choose family (:= dist. of y, penalty distribution for error) Likelihood (model) draws2data.stan gq block, data2draws.stan model, gq block
H4b. Choose prior_dist (default: Normal) Prior (model)
6 Specify_implicit H5a. Specify {min, mode, max} value for est_param's prior param Prior (parameter, model) draws2data.stan gq block, data2draws.stan model, gq block
H5b. Choose sign (real, non-neg) for est_param's prior param
H5c. Choose type (disc/cont) for est_param's prior param
7 translate prior knowledge to distribution M2a. map H5.abc to distributions (PERT normal or Poisson or Gamma etc) $\theta \sim Normal(3, 1.5^2), \sigma \sim Gamma(5,2)$
8 predict M3a. draws2data.stan, fit_prior_data.sample(), fit_prior_data = (U2.ab, U3.ab, U4.ab): Prior predictive check (opt-out prior)
9 infer to verify M4a. Stan, data2draws.stan,.create_stan_program(H1a, H2abc, H3a, H4ab, H5abc): Infer parameter from (synthetic) data (Test or autoCalib) Prior predictive check plot (summary stats.)
10 Specify_tolerance H6a. Set precision with iter_sampling (:= # of samples) $\gamma$ from SBC-graphics
H6b. Select posterior approximator from [[5 Merging Algorithm Tribes]] Posterior_approximator (inference algorithm)
11 infer to validate M5a. Stan, fit_post_draws.sample(), fit_post_draws = (P1, U3.ab, U4.ab, U5.ab): Posterior predictive check (opt-in prior) Posterior predictive check plot