stan-dev / stanc3

The Stan transpiler (from Stan to C++ and beyond).
BSD 3-Clause "New" or "Revised" License
140 stars 44 forks source link

Automatically generate code for forward sampling from model #174

Open VMatthijs opened 5 years ago

VMatthijs commented 5 years ago

Using a dependence analysis, it should be easy to determine whether a Stan model represents a DAG. That should be a check that should be integrated with @rybern 's factor graph analysis.

If we determine that a Stan program is a DAG, then we can automatically generate code for forward sampling from the model, which is super useful for prior predictive checks and SBC. The idea is basically just to replace ~-statements (and appropriate target increments) with assignments of rng draws. The trick will be to try to transform to the code we want and to then type check before committing to make sure the appropriate rngs exist. Otherwise, we can throw an error suggesting that the user implement the needed rng and submit a PR to Stan Math.

One design decision will be what to do with covariates in the data block (which never appear to the left of a ~-statement). One natural thing to do would be to draw them from a sort of uniform distribution. It might be more useful though for some applications to draw them from a normal distribution. Another option is not to draw them at all but to leave them as IO calls. Perhaps we should offer multiple options to the users.

Note that this feature would be super useful for automatically generating synthetic datasets from models to improve our test coverage.

jeffreypullin commented 5 years ago

But is this possible? See bentanalpha's comment here.

I'd be very to help with this - I've done a bit of thinking about simulating from dag's and auto generating stan code as part of some work on greta but I'm not very familiar with either Stan's MIR or OCaml...

Best,

Jeffrey

seantalts commented 5 years ago

Stan can express models which would be hard to do this for automatically, and it can express models for which it would be easy :) and for PPCs there can be choices made even for DAG models around e.g. whether to create new groups in a hierarchical model or just simulate within the existing groups.

Eventually we'll likely add a mechanism for labeling parts of the model for the models where this is harder to do. And some will never be amenable as they aren't generative. But we always encourage generative modeling and many of the models people write are amenable.

On Fri, Jun 14, 2019 at 04:21 Jeffrey Pullin notifications@github.com wrote:

But is this possible? See bentanalpha's comment here https://discourse.mc-stan.org/t/tool-to-auto-generate-model-diagrams/7361/7?u=voltemand .

I'd be very to help with this - I've done a bit of thinking about simulating from dag's and auto generating stan code as part of some work on greta https://greta-stats.org/ but I'm not very familiar with either Stan's MIR or OCaml...

Best,

Jeffrey

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stanc3/issues/174?email_source=notifications&email_token=AAGET3FCWASWY2KY2ZB5P6LP2NIJNA5CNFSM4HX2OJRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXWCYYQ#issuecomment-502017122, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGET3FXGMGMQ2OX4T55NFTP2NIJNANCNFSM4HX2OJRA .

jeffreypullin commented 5 years ago

I am aware of those subtleties, I was sort of just being overly provocative; I was quite surprised by the assertion that it is impossible.

The way I (naively) think about it is that any sensible (read generative) model written in Stan can be converted to a DAG.

Anyway, I love to help with all this - particularly if you are willing to accept contributions written by people not fluent in OCaml

seantalts commented 5 years ago

Yeah, I think there's a big difference between proving that X is theoretically impossible to do for all cases when you allow pathological examples and building a tool that does X most of the time. I prefer the latter.

We love contributions! How much time would you have to devote to this and what's your programming experience look like so far? We do have a policy of code review so we'd want to work with you to get it into a state we think is somewhat consistent with the rest of our code base before merging, so it could be a non-trivial project if you're learning OCaml and functional programming (map, fold, tree transformations) at the same time.

jeffreypullin commented 5 years ago

Not a whole lot of time unfortunately - only a few hours a week at the moment but that should hopefully increase in the future. I have a bit of programming experience - mainly R (fluent), C, bit of python. I've haphazardly taught myself Haskell and OCaml at various those I've not done enough 'real' programming in them to feel comfortable.

Maybe you could point me in the direction of an easy first issue and I can see how I find that?

seantalts commented 5 years ago

Gotcha. Maybe try your hand at https://github.com/stan-dev/stanc3/issues/24 ? It's not really related but a lot of other stuff is evolving currently around analysis so that's probably for the best. If it helps I could give you a 20 minute video chat intro to the code base and this issue, feel free to email me to schedule a time :)