stan-dev / stanc3

The Stan transpiler (from Stan to C++ and beyond).
BSD 3-Clause "New" or "Revised" License
143 stars 47 forks source link

`--info` include information parameter transforms #1461

Closed andrewfowlie closed 4 weeks ago

andrewfowlie commented 1 month ago

Could the stanc --info command include information about constraints? At the moment, e.g.,

parameters {
  real<lower=0, upper=1> theta;
}

and

parameters {
  real theta;
}

both result in identical output,

{
  "inputs": {},
  "parameters": { "theta": { "type": "real", "dimensions": 0 } },
  "transformed parameters": {},
  "generated quantities": {},
  "functions": [],
  "distributions": [],
  "included_files": []
}

that doesn't mention the constraints. It would be great if the one with constraints gave something like,

{
  "inputs": {},
  "parameters": { "theta": { "type": "real", "dimensions": 0, "lower": 0, "upper": 1 } },
  "transformed parameters": {},
  "generated quantities": {},
  "functions": [],
  "distributions": [],
  "included_files": []
}

Ideally, this would work everywhere (i.e., in all the blocks), and cover the shift and scale modifiers as well.

WardBrian commented 1 month ago

I believe we could include transform information (e.g. this, but also something like "this is a simplex"). Note that the actual bound may not be a simple number, so in general the output may not be that useful. E.g,

parameters {
  real a;
  real <lower=exp(a) - 3> b;
}

The output would now have to contain something like "lower": "exp(a) - 3", so you can't assume you just get a number there

andrewfowlie commented 1 month ago

That’s very true. I suppose it’s also true of the dimension.

I would find this very useful - I want to be able to infer programmatically (statically, not at runtime) whether we’re operating on a constrained space or not.

bob-carpenter commented 4 weeks ago

I suppose it’s also true of the dimension.

For dimension, it has to be static data. So we can resolve the sizes as soon as the data is fixed. The bounds can't be resolved until we have parameter draws.

WardBrian commented 4 weeks ago

https://github.com/stan-dev/stanc3/issues/904 asked about the dimension information. It was never implemented.

It does seem like it is probably reasonable to know that a variable has a lower bound, even if the exact lower bound is some expression you can't resolve on the outside.

andrewfowlie commented 4 weeks ago

Thanks both. I had confused dimension and size in my previous message. I now see that this is a harder problem than I first thought.

If the static analysis is challenging, maybe instead there could be some methods for introspection of a Stan model at runtime, once a Stan model + data is actually instantiated?

My use case is that I want to implement other inference algorithms with Stan models. It's very helpful to have as much information as possible about whether we are working on a constrained or unconstrained space, and what transforms we need to make. I've been using the bridgestan interface. Could we one day have methods bs_get_parameters_lower(model) etc for an instance of Stan model that retrieve any lower transforms of parameters in the parameters block?

If this were Python, an implementation would work like

# the stan model
.... data blocks ...
parameters {
  real<lower=1> x;
  real<upper=2> y;
  real<lower=2, upper=3> z;
}
... other blocks ...
model = ... some code to read above model ... 
bs_get_parameters_lower(model)  # [1, None, 2]
bs_get_parameters_upper(model)  # [None, 2, 3]

This is C/C++ so some more thought required about handling the None cases, but the basic idea would be the same. If this is better way forward than the static analysis, we can move the discussion over to bridgestan.

WardBrian commented 4 weeks ago

The example you provide is actually still not the worst case scenario, which is you can have parameters’ bounds depend on other parameters, so you need to provide both data but also a set of (unconstrained) parameters to get the bounds.

Can you elaborate on what your goal is with this information?

andrewfowlie commented 4 weeks ago

Ah, I didn't realise that. So we can have all sorts like

parameters {
  real<lower=0> x;
  real<lower=x> y;
  real<lower=x, upper=y> z;
}

yes, it becomes complicated, and even my bs_get_parameters_lower(model) idea on a model instance won't work, because as you say it needs the parameters too.

Sure, let me be more explicit. I am working on implementing nested sampling algorithms on Stan models

To achieve this, I will write the models as follows

parameters {
  vector<lower=0, upper=1>[N] x;   // this is the U[0, 1]^d hypercube
}
transformed parameters {
  vector[N] theta = 10 * x  - 5;   // these are the actual parameters, transformed from [0, 1] to [-5, 5] for example
}

Possibly, for this simple case of a flat prior I could allow this as well

parameters {
  vector<lower=-5, upper=5>[N] theta; 
}

I think this is going to work as I want, and produce consistent results with native Stan samplers and nested sampling algorithms. I would like to be able to run a check though that model's are implemented as I required though (with lower=0 and upper=1) and refuse to compile (e.g. static assert) or throw an exception at runtime if they are not.

The only way I have thought up of doing this at the moment is comparing the unconstrained/constrained parameters to the results of logit transforms.

I don't know whether my use case seems obscure to you. My feeling, though, is that this sort of problem could arise when trying to implement algorithms outside of Stan that require some extra information or some extra constraints on a Stan model.

WardBrian commented 4 weeks ago

The only way I have thought up of doing this at the moment is comparing the unconstrained/constrained parameters to the results of logit transforms.

This seems reasonable. It may not be 100% accurate, but it would probably catch anyone who isn't actively trying to fool you

bob-carpenter commented 4 weeks ago
parameters {
  vector<lower=0, upper=1>[N] x;   // this is the U[0, 1]^d hypercube
}
transformed parameters {
  vector[N] theta = 10 * x  - 5;   // these are the actual parameters, transformed from [0, 1] to [-5, 5] for example
}

Are there then going to be things like priors on theta? If so, and the transforms can be non-linear or involve multiplication by something that's another parameter rather than a constant, then you need to be careful to include the change-of-variables adjustment for the transform.

Stan will then take all of the x and log-odds transform to unconstrained using logit. In practice, we work on the unconstrained scale, then apply the inverse transform (inv_logit here) along with the change of variables correction for the non-linear transform. We then inverse transform again to present MCMC results on the constrained scale.

I would like to be able to run a check though that model's are implemented as I required though (with lower=0 and upper=1) and refuse to compile (e.g. static assert) or throw an exception at runtime if they are not.

I don't know anything about nested sampling, including why you are trying to do this. But if all you need is a distribution on $[0, 1]^D$ you can transform Stan's models which have support over $\mathbb{R}^N$ using inverse logit and the change of variables is pretty simple, because $\textrm{invLogit}'(x) = \text{invLogit}(x) \cdot (1 - \text{invLogit}(x))$.

WardBrian commented 4 weeks ago

I think for your purposes, the implementation in #1462 might work anyway, since you're requiring such a restricted class of models you can just reject anything that is more complicated than "0" or "1"

andrewfowlie commented 4 weeks ago

TLDR

Hi both,

Thanks so much for the feedback & ideas. Bob, in these algorithms priors are implemented by transforming from the U[0, 1]^d hypercube. I.e., we have an parameterization on which the priors are flat on U[0, 1] and we use inverse transform sampling to convert that to whatever we want.

andrewfowlie commented 4 weeks ago

PS I've invited you to take a look at the code repo for using one of these algorithms & stan, it will be public soon anyway,

bob-carpenter commented 4 weeks ago

Thanks. Once you go to inverse transform sampling over (0, 1)^D, then you can use Quasi Monte Carlo.

Stan could be easily transformed to sample on (0, 1)^D, but it won't be uniform.