hyunjimoon / SBC

https://hyunjimoon.github.io/SBC
Other
47 stars 3 forks source link

Generators using Stan models #35

Open martinmodrak opened 2 years ago

martinmodrak commented 2 years ago

There are probably multiple flavors we could consider:

hyunjimoon commented 2 years ago

Note that this is a specific generator usecase. With added latent gaussian mixture model for the calibration target, the following could be generalized with "family" and "family args". Added "family" signature is mapped as an input to self-calib function as transform_type which is delivered here as "poisson", "negative-binomial" to log, "binom" to logistic. For likelihood with more than one parameter, let's follow brms list of link function for each parameter here.

generator_gmm <- function(mixture_means, mixture_sds, fixed_values){
  # fixed value across simulated datasets
  ## meta
  nobs <- fixed_values$nobs
  ndraws <- fixed_values$ndraws
  ## distribution-specific
  nsize <- fixed_values$nsize # Do not confuse n of Binom(n, p) with `nobs`

  # predictor
  X = fixed_values$X
  # parameter with fixed distribution across `nsims` datasets
  b <- fixed_values$b 
  # target variable updated at each iteration
  a <- rvar_rng(rnorm, n = 1, sample(mixture_means$a, 1, replace=TRUE), sd=mixture_sds$a)

  # generate
  mu = draws_of(a + X %**% b)
  mu = invlogit(mu)
  Y <- rvar_rng(rbinom, n = nobs, size = nsize, p = mu, ndraws = nsims) 
  gen_rvars <- draws_rvars(nsims = nsims, nobs = nobs, 
                           mixture_means = mixture_means$a, mixture_sds = mixture_sds$a, 
                           Y = Y)
  SBC_datasets(
    parameters = as_draws_matrix(list(a = a)), 
    generated = draws_rvars_to_standata(gen_rvars)
  )
}
alevaracca commented 2 years ago

Hi,

Not sure if this is the right section, but I was wondering about the possibility of generating data using the same Stan file that one would develop, for example, to run Prior Predictive checks (i.e.: data & generated quantities blocks only). I am asking because I have (and I expect many other users will have too) several such files already available and it would be nice to use them in SBC. Also, some are very complex, so converting them to a new generator would be quite a lot of work.

Thanks!

hyunjimoon commented 2 years ago

That is a great question. We have searched autogenerator from stanfile, but there were difficulties which @Dashadower could share further.

One development option that comes to my mind is to use modular stan program which is the template for stanfile. If this modular program could be used to generate both SBC generator and stanfile (the latter @Dashadower is working on with the support from this) repo it would prevent an double effort.

@alevaracca could you please share your thoughts on whether the above suggestion would meet your needs? Modular program is explained in more detail in the first section here.

alevaracca commented 2 years ago

Thanks for the quick reply @hyunjimoon, I'll have a look into this and give it a try! I'll get back to you eventually.

martinmodrak commented 2 years ago

AFAIK, there are multiple ways people create their generators in Stan. To prioritise right: could you share what a typical (potentially simplified) Stan file you are using looks like? Are you using rstan or cmdstanr? (I'd be happy to get feedback on the implementation, so if you are willing to do some testing on the first version, I'll first implement a version that matches your needs). Note however, that a hard limitation of current Stan core is that there is no way to get to the results of the transformed data block.

Also note that you can always create a dataset explicitly via SBC_datasets() - this requires you to do the necessary juggling between data formats, but will work immediately. I hope looking at https://hyunjimoon.github.io/SBC/reference/SBC_datasets.html and potentially looking what the results of generate_datasets(SBC_example_generator("normal"), n_sims = 50) look like makes it clear on what the expected format is.

Probably the most difficult conversion necessary is already covered by (currently undocumented) draws_rvars_to_standata() - this takes an object of type draws_rvars and converts each draw into a list that can be passed as data to Stan. I.e. the result of draws_rvars_to_standata() can be directly passed as the generated = argument of SBC_datasets()

Dashadower commented 2 years ago

Just to chime in on using generated qualities for SBC; originally this was the way that the library was implemented. But it turned out that it was easier to extract the draws from a stanfit into a R object like rvar as to writing SBC for every model(what Martin is saying above).

alevaracca commented 2 years ago

Thank you both for the reply. Martin, I'll give your approach a try in the next few days and get back with some feedback (plus some code of the model that I am SBC-ing for).

alevaracca commented 2 years ago

Sorry if it took this much to get to this point, but I have followed Martin's instructions and they all worked out smoothly. Turns out it was not too complicated to tinker with the different formats. The conversion using draws_rvars_to_standata() did the job as well. Cheers.

maugavilla commented 2 years ago

I think my issue relates to this overall thread. I am working to impement SBC with the blavaan package, here we have pre compile Stan model, for usually large models with a lot of parameters. And within blavaan I can generate data sets from priors, so I could skip the generator function for example. But I cant include my list of data sets, as it is not an SBC_datasets type object.

From this 2 questions and possible additions:

Appreciate any guidelines

martinmodrak commented 2 years ago

@maugavilla as this will likely require discussing a bunch of stuff that's specific to blavaan, I've moved the discussion to a new issue: #69