stan-dev / posteriordb

Database with posteriors of interest for Bayesian inference
163 stars 26 forks source link

Julia PPLs? #184

Open trappmartin opened 3 years ago

trappmartin commented 3 years ago

Hi, This looks like a really nice project. Do you aim to also include codes for Julia based PPLs? If so, I’m happy to have a look at it.

Cheers, Martin

MansMeg commented 3 years ago

Hi! Sorry for my late respons (vacation). I would be happy to add Julia PPL! I guess the first step would be to reproduce the 8-schools in Julia and Ill add that code to get the structure in? then we can just add additional models. We should also find a way to check that the models are identical using Travis.

Im happy to help!

trappmartin commented 3 years ago

Super cool. I’ll open a PR, but it might take a few days as I’m currently preparing for my viva next week.

MansMeg commented 3 years ago

Sound great! Just let me know if I can help somehow?

sethaxen commented 2 years ago

I'd like to contribute to this as I find the time. But before I get started, @MansMeg, how do you recommend proceeding? 1 model per PR, or are batches per PR fine? (How) do you ensure the models are up-to-date/accurate?

For Stan, each model is a text block, but for Python and Julia, each model is a script. How is this then packaged by posteriordb so that it can be used?

MansMeg commented 2 years ago

Hi! I think the first step is to get the structure right for the julia models. So I think the best would be if you could create a first model that is identical to any of the Stan models already in (say eight schools).

The accuracy should probably be tested by checking the (proportional) log density values for a set of parameter draws. I can help fixing this.

Lastly, we should try to setup a julia test suite to check that the julia models will run/work.

I dont know exactly how to best formulate a julia model, so here it would be good with suggestions from you. What do you think would be best?

sethaxen commented 2 years ago

I think for Julia PPLs it would be best for each model to be defined in a Julia script that would include

  1. any necessary package imports
  2. any necessary function definitions
  3. the model definition

There should also be a Project.toml file that specifies the version bounds for the used packages. Ideally there'd be one per model, but that would be painful to maintain, and since there will probably only be a few required packages (e.g. DifferentialEquations.jl), it's probably okay to have one global Project.toml.

Here are two example models reproduced in Turing.jl:

eight_schools_noncentered ```julia using Turing @model function eight_schools_noncentered( J, # number of schools y, # estimated treatment sigma, # std of estimated effect ) mu ~ Normal(0, 5) # hyper-parameter of mean tau ~ truncated(Cauchy(0, 5), 0, Inf) # hyper-parameter of sdv, a non-informative prior theta_trans ~ filldist(Normal(0, 1), J) # transformation of theta theta = theta_trans .* tau .+ mu # original theta, treatment effect in school j y ~ MvNormal(theta, sigma) end ```
hudson_lynx_hare ```julia using DifferentialEquations, Turing function lotka_volterra( du, # system rate {prey, predator} u, # system state {prey, predator} p, # parameters t, # time ) x, y = u alpha, beta, gamma, delta = p du[1] = (alpha - beta * y) * x # dx du[2] = (-gamma + delta * x) * y # dy end @model function model( N, # number of measurement times ts, # measurement times > 0 y_init, # initial measured populations y, # measured populations tspan=extrema(ts), prob_base=ODEProblem(lotka_volterra, [log(10), log(10)], tspan, [1, 0.05, 1, 0.05]), ) theta ~ MvNormal([1, 0.05, 1, 0.05], [0.5, 0.05, 0.5, 0.05]) # { alpha, beta, gamma, delta } sigma ~ filldist(LogNormal(-1, 1), 2) # measurement errors z_init ~ filldist(LogNormal(log(10), 1), 2) # initial population prob = remake(prob_base; u0=z_init, p=theta) sol = solve( prob; saveat=ts, save_start=false, save_end=false, rel_tol=1e-5, abs_tol=1e-3, maxiters=500, ) z = reduce(vcat, sol') y_init .~ LogNormal.(log.(z_init), sigma) y .~ LogNormal.(log.(z), sigma') end ```

A potential complication is that in Julia there are multiple automatic differentiation packages that could be used for sampling, and these are independent of the PPLs. It's possible that a given model implementation will work well with one backend but not another, so it might be worth it to hardcode the AD backend choice in the model definition as well.

I understand Stan has both forward- and reverse-mode ADs. Is the best AD mode stored for stan models in the database somehow, or is the default always used?

@trappmartin, what are your thoughts? Anything I've missed?

MansMeg commented 2 years ago

Hi!

This looks great!

I actually dont know the exakt backend used by Stan by heart, but it is the default used.

I think this looks great. I guess the next step(s) are: 1) Compute the (proportional) log density for these two models for a set of parameters (both in Stan and Turing). 2) Add the proportional log density values to the database 3) Add the julia code for the model 4) add how to run Turing with posteriord for these (two) models 5) add a julia test suite to check added julia models 6) add more models

I could do 1-3, but would most likely need help with 4 and 5.

trappmartin commented 2 years ago

Hi there, I have seen that there is now an extra repo for R and python. Would we need one for Julia then, as this repo doesn't contain any scripts to run inference on models?

If I understand correctly, a JuliaPosteriorDB repo would then handle the loading of libraries. Or it seems this is how it is done for PyMC.

MansMeg commented 2 years ago

Yes. I could try to get such a repo up.

sethaxen commented 1 year ago

@MansMeg I have created a repo PosteriorDB.jl that, similarly to posteriordb-r and posteriordb-python, provides functionality for working with posteriordb. If you haven't started on a Julia repo yet, would you like this to be it, and if so, would it make sense to move it to this org?

MansMeg commented 1 year ago

That sounds great! Im happy to look into this.

sethaxen commented 1 year ago

Great! IIRC, the process would be:

  1. I transfer ownership of the repo to a user who has repository creation rights in the stan-dev org
  2. That user then transfers ownership to the stan-dev org

Also, I would prefer I be given administrative rights on the repository after transfer so I can continue maintaining it.