stan-dev / posteriordb

Database with posteriors of interest for Bayesian inference
181 stars 36 forks source link

Convert golden samples to arviz IData #225

Open feynmanliang opened 3 years ago

feynmanliang commented 3 years ago

Arviz provides nice visualization tools for posterior samples, but the current golden samples data format requires a pretty long dance to get it into an `InferenceData. For example, currently I am doing this to convert the eight schools golden samples into a format arviz can visualize:

gs = my_pdb.posterior("eight_schools-eight_schools_centered").reference_draws()
gs_dict = {}
num_chains = len(gs)
num_samples = len(gs[0][next(iter(gs[0]))])

for i,chain in enumerate(gs):
    for var in chain:
        if '[' not in var:
            if var not in gs_dict:
                gs_dict[var] = np.zeros((num_chains, num_samples))
            gs_dict[var][i,:] = np.array(chain[var])
        else:
            name = var.split('[')[0]
            idx = int(var.split('[')[1].split(']')[0]) - 1
            if name not in gs_dict:
                var_size = len(list(filter(lambda x: x.startswith(name), chain)))
                gs_dict[name] = np.zeros((num_chains,num_samples,var_size))
            gs_dict[name][i,:,idx] = np.array(chain[var])

gs_idata = az.convert_to_inference_data(
    gs_dict,
    coords={"school": np.arange(data.values()['J'])},
    dims={
        "theta": ["school"],
    }
)

Is there an easier way to do this that I am missing? If not, would it be worthwhile to package something like this up as a library method (or could .reference_draws() return an InferenceData with the chain/draw/other dimensions set up)?

ahartikainen commented 3 years ago

I did this some time ago (it uses from_dict)

https://gist.github.com/ahartikainen/ca4ec935c78c56e2d352b8d34a286fd0

Not sure if posteriordb will add this kind of functionality, ArviZ might be a better place for it.

feynmanliang commented 3 years ago

Thanks :) I'll go link this issue over there

MansMeg commented 3 years ago

Great @ahartikainen . This structure is based on the posterior R package structure and I use that structure to read and write the JSON posteriors, so I guess it would probably be something that would fit in the Arviz package, although I think it would be good to get reading in the gold standard to be part of the python posteriordb library.