Measurement model - Githubissues

A critical component for performing inference (either from a frequentist or Bayesian perspective) is the definition of the measurement model. This component connects the System Dynamics model (X) with data (y). Let's denote the measurement model by $\pi(y | X(\theta))$, where $\pi$ is a probabilistic function & $\theta$ represents a vector of model parameters. If we fix $\theta$, we refer to the measurement model as the sampling distribution. From this distribution, we can obtain simulated measurements. If we fix y, we refer to the measurement model as the likelihood function which acts as a device to determine whether a measurement is close to its expected value.

Since the parameterisation of the measurement model is not unique given the various choices of probabilistic distributions, users are required to input the appropriate probabilistic function for the System Dynamics model. The challenge from the developers' point of view is how to make this process as seamlessly as possible. The purpose of this issue is to identify & discuss user-friendly approaches to get the specification of measurement models, & further default choices. For instance, the specification of the negative binomial distribution entails an additional parameter (phi).

A proposal for this specification is the requirement for users to employ Stan language: "y ~ neg_binomial_2(stock_name, phi)"

May I say the key work for this issue for automation is to create a map between keywords and actual distribution? e.g.

1.a no bound, continuous : normal 1.b no bound, discrete: ? (perhaps no practical example)

2.a only lower bound (lbd), continuous : lbd + gamma, log-normal 2.b only lower bound (lbd), discrete : lbd + negative binomial

3.ab only upper bound (ubd): -1 * only lower bound

4.a both lower, upper bound: lbd + (ubd-lbd) * Beta( , ) vs 4.b 4.b both lower, upper bound, mode: N(mode, ((ubd+lbd)/6)^2)

Q1. Which classification standards might I be missing other than bounds and types? Q2. which distribution should we set default for 3.a and 4.a?

Let me note that the above affects model in two separate ways: hard constraints in parameter block (which goes through logit transform) and weak constraint in model block as an added value of regularization. Also, this map will correspond to step5,6 below. Pinging @Dashadower as he is working on this (in pyhon).

Step	Goal	Machine's work	Human's work	Bayes Component (Stan block)	output format
1	Prior specification of Structure	`Vensim` assists H1.a()	H1.a Translate mental model to SD model (behavioral classification)	Prior (`function`)	`.mdl`
2	Prior specification of Demand	`Vensim` assists H2.a()	H2.a List policy functions		`.vpd`
3	Prior specification of Data Variation	`PySD` assists H3.abcd()			`.json` with key: `est_param`, `ass_param`, `ass_param_ts`, `obs_stock`
	- parameter variation		H3.a Choose each parameters to be either `est_param` or `ass_param`	Prior and constraint `est_param` : `transformed parameter` & `ass_param` : `data`)
	- parameter variation		H3.b Specify scalar for `assmed_param`
	- parameter variation		H3.c Specify vector_ts for `assumed_param_ts`
	- state variation		H3.d Choose each states to be `obs_state` or `unobs_state`
4	translate Manag. to Stats. lang.	M1a. `PySD`, `.build_function_block`(H1.a)			`structure.stan`
5	Specify_explicit		H4a. Choose `family` (:= dist. of y, penalty distribution for error)	Likelihood (`model`)	`draws2data.stan` gq block, `data2draws.stan` model, gq block
			H4b. Choose `prior_dist` (default: Normal)	Prior (`model`)
6	Specify_implicit		H5a. Specify {min, mode, max} value for `est_param`'s prior param	Prior (`parameter`, `model`)	`draws2data.stan` gq block, `data2draws.stan` model, gq block
			H5b. Choose sign (real, non-neg) for `est_param`'s prior param
			H5c. Choose type (disc/cont) for `est_param`'s prior param
7	translate prior knowledge to distribution	M2a. map H5.abc to distributions (PERT normal or Poisson or Gamma etc)			$\theta \sim Normal(3, 1.5^2), \sigma \sim Gamma(5,2)$
8	predict	M3a. `draws2data.stan`, `fit_prior_data.sample()`, `fit_prior_data = (U2.ab, U3.ab, U4.ab)`: Prior predictive check (opt-out prior)
9	infer to verify	M4a. `Stan`, `data2draws.stan`,`.create_stan_program`(H1a, H2abc, H3a, H4ab, H5abc): Infer parameter from (synthetic) data (Test or autoCalib)			Prior predictive check plot (summary stats.)
10	Specify_tolerance		H6a. Set precision with `iter_sampling` (:= # of samples)		$\gamma$ from SBC-graphics
			H6b. Select posterior approximator from [[5 Merging Algorithm Tribes]]	Posterior_approximator (inference algorithm)
11	infer to validate	M5a. `Stan`, `fit_post_draws.sample()`, `fit_post_draws = (P1, U3.ab, U4.ab, U5.ab)`: Posterior predictive check (opt-in prior)			Posterior predictive check plot

hyunjimoon / DataInDM

Measurement model #19