Open hyunjimoon opened 1 year ago
Stanify is its first prototype: https://github.com/Data4DM/stanify.
@martinmodrak my main development language has moved to python (where more simulation-based inference tools can be found such as BayesFlow), and I wonder whether there can be some way I can keep using SBC library (and contribute)? As I may be one of the people who use SBC most frequently, constructing this feedback structure means a lot to me.
One possibility (as I can think of) is to make our library SBC data structure similar to arviz InferenceData format. This can create serious refactoring, so I am careful and simply asking your opinion.
Possible benefits:
with daily downloads of 23,846, their structure is tested by many users
their major is graphical diagnostic which proves its effective structure for plotting
easy coordinate extension can support #8 e.g. school from their example
our future usecase of data-averaged posteriors calculation to measure miscalibration
Arviz schema explains InferenceData's groups and relations as below:
If the above benefits sounds reasonable, perhaps mapping
posterior, posterior_predictive, sample_stats, prior, observed_data
posterior, sample_stats, log_likelihood, posterior_predictive, observed_data, constant_data, prior, sample_stats_prior, prior_predictive, predictions, predictions_constant_data
can be a first step.
It appears that NetCDF has its own license, which looks like BSD or MIT (no restriction other than credit), which is good.
A while ago I wrote code in my experiments package aria ( https://github.com/mike-lawrence/aria) that used netcdf (which is a subset of hdf5 fyi) to prep data stores for models (based on the expected parameter shapes/types) using something close to the InferenceData spec and handled during-sampling parsing of the standard Stan output for writing to those stores. I possibly have a bug I was unable to find, but I found that I itch particular (usually very large) models the netcdf creation became unexpectedly slow. I’ve been using zarr in another project since and been meaning to swap out netcdf for zarr in aria to see if it performs any better. I also thought I’d look at feather/arrow again to see if the feature I needed but they lacked (single-writer multiple-readers in HDF5-speak) has been added since I last looked; if so, they should be more performant and cross-language supported than zarr.
On Mon, Nov 7, 2022 at 11:49 AM tomfid @.***> wrote:
It appears that NetCDF has its own license, which looks like BSD or MIT (no restriction other than credit), which is good.
— Reply to this email directly, view it on GitHub https://github.com/hyunjimoon/SBC/issues/72#issuecomment-1305809513, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEZ7NSNMGJFAFNUL3PY3LWHEQHBANCNFSM6AAAAAAQKMJELM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
--
-- Mike Lawrence, PhD Co-founder & Research Scientist Axem Neurotechnology axemneuro.com
~ Certainty is (usually) folly ~
We tested HDF5 for use as a native file format in Vensim. This was several years ago (5?) but at the time, it was way too slow. Possibly there's some performance tradeoff for the flexibility of the format. It doesn't seem like there should be one - after all, the number of floats written doesn't change, and structure shouldn't be terribly expensive, but ... ?
Discussion with @OriolAbril, Arviz has project on connecting inferencedata with R, and we laid out brief plan as follows which Oriol is happy to help in NetCDF side on one SBC library's branch.
RNetcdf
, ncdf4
) output sbc library supports rvar
FYI, I had issues with the RNetcdf package (hit some sort of bottleneck with very large models, leading to very slow file initialization), so have been looking at options for using zarr instead. Rarr package seems to be the most complete R-based read/write interface for zarr, but only supports doubles for writing and doesn't support group hierarchies, but I don't think either limitation is particularly pertinent to saving data consistent with the InferenceData spec. That said, another limitation is that it can only write in column-wise chunks, which is fine for scenarios of rapid sampling and/or waiting until sampling is complete before accessing the data, but for scenarios where you want during-sampling-diagnostics (something I'm keen on), I think the column-wise chunking might be an issue.
An alternative could be to simply use reticulate to run canonical zarr-python code, giving one all features of the zarr spec.
Poster for the conference that introduces stanify
: Bridging Statistics and Dynamic Modeling with Vensim, Python, and Stan ISDC23_stanify.pdf
@jandraor's paper on applying Bayesian workflow could be extended with SBC. Currently system dynamics model translators are progressed in parallel (python and R in the linked branch of the library).
This could start from a simple model case study such as SIR and prey-predator, each of which has three and four parameters. New SBC features such as 1. test quantity (likelihood function) 2. dynamic rejection based on gamma distribution can be incorporated. It would be interesting to find discrete value parameter example in the future. Three checks and prior elicitation will be the minimum contents.
Analysis of prey-predator will be mainly documented here.
@Dashadower may I ask for some advice on whether migrating the above file to this repository might be possible? i.e. do you recommend being standalone library from pysd, or might considering combining to python version of SBC package be possible?..