Support InferenceData and Stanify

hyunjimoon commented 1 year ago

@jandraor's paper on applying Bayesian workflow could be extended with SBC. Currently system dynamics model translators are progressed in parallel (python and R in the linked branch of the library).

This could start from a simple model case study such as SIR and prey-predator, each of which has three and four parameters. New SBC features such as 1. test quantity (likelihood function) 2. dynamic rejection based on gamma distribution can be incorporated. It would be interesting to find discrete value parameter example in the future. Three checks and prior elicitation will be the minimum contents.

Analysis of prey-predator will be mainly documented here.

@Dashadower may I ask for some advice on whether migrating the above file to this repository might be possible? i.e. do you recommend being standalone library from pysd, or might considering combining to python version of SBC package be possible?..

hyunjimoon commented 1 year ago

Stanify is its first prototype: https://github.com/Data4DM/stanify.

@martinmodrak my main development language has moved to python (where more simulation-based inference tools can be found such as BayesFlow), and I wonder whether there can be some way I can keep using SBC library (and contribute)? As I may be one of the people who use SBC most frequently, constructing this feedback structure means a lot to me.

One possibility (as I can think of) is to make our library SBC data structure similar to arviz InferenceData format. This can create serious refactoring, so I am careful and simply asking your opinion.

Possible benefits:

with daily downloads of 23,846, their structure is tested by many users
their major is graphical diagnostic which proves its effective structure for plotting
easy coordinate extension can support #8 e.g. school from their example
our future usecase of data-averaged posteriors calculation to measure miscalibration

Arviz schema explains InferenceData's groups and relations as below:

If the above benefits sounds reasonable, perhaps mapping

our groups to InferenceData's posterior, posterior_predictive, sample_stats, prior, observed_data
our relation (e.g. from our API to posterior, sample_stats, log_likelihood, posterior_predictive, observed_data, constant_data, prior, sample_stats_prior, prior_predictive, predictions, predictions_constant_data

can be a first step.

tomfid commented 1 year ago

It appears that NetCDF has its own license, which looks like BSD or MIT (no restriction other than credit), which is good.

mike-lawrence commented 1 year ago

A while ago I wrote code in my experiments package aria ( https://github.com/mike-lawrence/aria) that used netcdf (which is a subset of hdf5 fyi) to prep data stores for models (based on the expected parameter shapes/types) using something close to the InferenceData spec and handled during-sampling parsing of the standard Stan output for writing to those stores. I possibly have a bug I was unable to find, but I found that I itch particular (usually very large) models the netcdf creation became unexpectedly slow. I’ve been using zarr in another project since and been meaning to swap out netcdf for zarr in aria to see if it performs any better. I also thought I’d look at feather/arrow again to see if the feature I needed but they lacked (single-writer multiple-readers in HDF5-speak) has been added since I last looked; if so, they should be more performant and cross-language supported than zarr.

On Mon, Nov 7, 2022 at 11:49 AM tomfid @.***> wrote:

It appears that NetCDF has its own license, which looks like BSD or MIT (no restriction other than credit), which is good.

— Reply to this email directly, view it on GitHub https://github.com/hyunjimoon/SBC/issues/72#issuecomment-1305809513, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEZ7NSNMGJFAFNUL3PY3LWHEQHBANCNFSM6AAAAAAQKMJELM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

-- Mike Lawrence, PhD Co-founder & Research Scientist Axem Neurotechnology axemneuro.com

~ Certainty is (usually) folly ~

tomfid commented 1 year ago

We tested HDF5 for use as a native file format in Vensim. This was several years ago (5?) but at the time, it was way too slow. Possibly there's some performance tradeoff for the flexibility of the format. It doesn't seem like there should be one - after all, the number of floats written doesn't change, and structure shouldn't be terribly expensive, but ... ?

hyunjimoon commented 1 year ago

Discussion with @OriolAbril, Arviz has project on connecting inferencedata with R, and we laid out brief plan as follows which Oriol is happy to help in NetCDF side on one SBC library's branch.

We aim to start from Mike's https://github.com/hyunjimoon/SBC/issues/41#issuecomment-1312521139
focus on netcdf (not zarr)
read netcdf, multiple group and array
R-library's (RNetcdf, ncdf4) output
converting R-library's output to rvar

sbc library supports rvar

mike-lawrence commented 1 year ago

FYI, I had issues with the RNetcdf package (hit some sort of bottleneck with very large models, leading to very slow file initialization), so have been looking at options for using zarr instead. Rarr package seems to be the most complete R-based read/write interface for zarr, but only supports doubles for writing and doesn't support group hierarchies, but I don't think either limitation is particularly pertinent to saving data consistent with the InferenceData spec. That said, another limitation is that it can only write in column-wise chunks, which is fine for scenarios of rapid sampling and/or waiting until sampling is complete before accessing the data, but for scenarios where you want during-sampling-diagnostics (something I'm keen on), I think the column-wise chunking might be an issue.

An alternative could be to simply use reticulate to run canonical zarr-python code, giving one all features of the zarr spec.

hyunjimoon commented 11 months ago

Poster for the conference that introduces stanify: Bridging Statistics and Dynamic Modeling with Vensim, Python, and Stan ISDC23_stanify.pdf

hyunjimoon / SBC

Support InferenceData and Stanify #72