stan-dev / posteriordb

Database with posteriors of interest for Bayesian inference
176 stars 36 forks source link

Filtering API for python #132

Open eerolinna opened 4 years ago

eerolinna commented 4 years ago

The R filtering API is like

pos <- filter_posteriors(my_pdb, data_name == "eight_schools")

The same exact API cannot be done in python*. I propose this instead

pos = my_pdb.filter_posteriors(lambda posterior: posterior.data.name == "eight_schools")

In other words, filter_posteriors takes a function that takes a posterior object and returns a bool.

The function doesn't have to be given inline, the following is valid too

def filter_function(posterior):
    return posterior.data.name == "eight_schools"
pos = my_pdb.filter_posteriors(filter_function)

Return value of filter_posteriors is a list of posterior objects.

I also propose adding filter_models and filter_data that are equivalent to filter_posteriors but act instead on models or data.

The following query finds the posteriors where model has keyword bda3_example

filtered_models = my_pdb.filter_models(lambda model: "bda3_example" in model.information["keywords"])
filtered_posteriors = my_pdb.filter_posteriors(lambda posterior: posterior.model in filtered_models)

[*] This is because python doesn't support unevaluated expressions like data_name == "eight_schools"