stan-dev / posteriordb

Database with posteriors of interest for Bayesian inference
176 stars 36 forks source link

Some models have data that is not correct #140

Closed bbbales2 closed 4 years ago

bbbales2 commented 4 years ago

Error is:

Exception: variable does not exist; processing stage=data initialization; variable name=sigma1; base type=double  (in 'garch-garch11.stan' at line 4)

(same thing for garch):

Exception: variable does not exist; processing stage=data initialization; variable name=sigma1; base type=double  (in 'garch.stan' at line 4)
bbbales2 commented 4 years ago

For pkpd:

Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=C0; dims declared=(1); dims found=()  (in 'pkpd.stan' at line 27)

For one_comp_mm_elim_abs-one_comp_mm_elim_abs:

Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=C0; dims declared=(1); dims found=()  (in 'one_comp_mm_elim_abs-one_comp_mm_elim_abs.stan' at line 27)
MansMeg commented 4 years ago

Hmm. Strange! I will take a look on this asap!

MansMeg commented 4 years ago

Thank you @bbbales2

I have now found the error in the garch model and fixed it, so that now works.

Although, the "pkpd" model is still not working, and I can't really see why. The code below should be able to reproduce the error in R.

remotes::install_github("MansMeg/posteriordb", subdir = "rpackage")
po <- posterior('pkpd')
mc <- model_code(po, "stan")
info(mc)$urls
pd <- pdb_data(po)
x <- posteriordb:::run_stan.pdb_posterior(po, stan_args = list(chains = 1))
Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) : 
  Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=C0; dims declared=(1); dims found=()  (in 'modelb0c262388b8b_one_comp_mm_elim_abs_one_comp_mm_elim_abs' at line 27)
 [origin: unknown original type]
failed to create the sampler; sampling not done

The problem seem to come from line 27 of this model:

functions {
  real[] one_comp_mm_elim_abs(real t,
                              real[] y,
                              real[] theta,
                              real[] x_r,
                              int[] x_i) {
    real dydt[1];
    real k_a = theta[1]; // Dosing rate in 1/day
    real K_m = theta[2]; // Michaelis-Menten constant in mg/L
    real V_m = theta[3]; // Maximum elimination rate in 1/day
    real D = x_r[1];
    real V = x_r[2];
    real dose = 0;
    real elim = (V_m / V) * y[1] / (K_m + y[1]);

    if (t > 0)
      dose = exp(- k_a * t) * D * k_a / V;

    dydt[1] = dose - elim;

    return dydt;
  }
}

data {
  real t0;    // Initial time in days;
  real C0[1]; // Initial concentration at t0 in mg/L

  real D;   // Total dosage in mg
  real V;   // Compartment volume in L

  int<lower=1> N_t;
  real times[N_t];   // Measurement times in days

  // Measured concentrations in effect compartment in mg/L
  real C_hat[N_t];
}

transformed data {
  real x_r[2] = {D, V};
  int x_i[0];
}

parameters {
  real<lower=0> k_a; // Dosing rate in 1/day
  real<lower=0> K_m; // Michaelis-Menten constant in mg/L
  real<lower=0> V_m; // Maximum elimination rate in 1/day
  real<lower=0> sigma;
}

transformed parameters {
  real C[N_t, 1];
  {
    real theta[3] = {k_a, K_m, V_m};
    C = integrate_ode_bdf(one_comp_mm_elim_abs, C0, t0, times, theta, x_r, x_i);
  }
}

model {
  // Priors
  k_a ~ cauchy(0, 1);
  K_m ~ cauchy(0, 1);
  V_m ~ cauchy(0, 1);
  sigma ~ cauchy(0, 1);

  // Likelihood
  for (n in 1:N_t)
    C_hat[n] ~ lognormal(log(C[n, 1]), sigma);
}

generated quantities {
  real C_ppc[N_t];
  for (n in 1:N_t)
    C_ppc[n] = lognormal_rng(log(C[n, 1]), sigma);
}

That is

  real C0[1]; // Initial concentration at t0 in mg/L

The data is just a single value:

> pd$C0
[1] 0

So I can't really see where this problem comes from. Maybe @paul-buerkner , @avehtari or @bbbales2 can see this more clearly than me?

bbbales2 commented 4 years ago

I think the data here is what you want: https://github.com/stan-dev/stat_comp_benchmarks/tree/64b6597cd17de77e91dcfb1a9977758407017d20/benchmarks/pkpd

Should be more than a single value.

MansMeg commented 4 years ago

Thanks for the rapid reply. Yes, I now see that C0 in that code is an array with one dimension, while C0 in posteriordb is a vector of length 1. JSON does not represent these subtle differences, so it is not possible to represent the data in this way in the posterior database.

So I probably need to change this model slightly to get it to work with a vector instead of the 1-dimensional array in C0.

bbbales2 commented 4 years ago

Ooooh, that problem. Yeah I think there was some discussion in cmdstan. Didn't find it when I looked.

MansMeg commented 4 years ago

The only simple thing to do was to simply hard-code the value from the data within the model. Not beautiful, but the only reasonable thing to do right now. I noted it in the model as well. Hence the problem is now solved.