stan-dev / posteriordb

Database with posteriors of interest for Bayesian inference
176 stars 36 forks source link

Creating posteriordb object using options from a config file #147

Closed eerolinna closed 3 years ago

eerolinna commented 4 years ago

It could be useful to have a function pdb_from_config that would search the current working directory and parent directories for a config file and use the options from it to create a posterior database object. The name of the config file could be for example .posteriordbrc or posteriordb.yml.

This functionality could also be added to pdb_default instead of making a new function. To me this feels like the best solution, but I will keep using the name pdb_from_config in this issue to make it easier to distinguish between the current behaviour of pdb_default and the proposed one.

The config file could look something like this for a local posterior database

type: "local"
path: "path_to_pdb"

or something like this for a remote database

type: "remote"
url: "url_to_remote"
branch: "development"

then in code instead of writing pdb_local or pdb_github we would use pdb_from_config. pdb_local and pdb_github should not be removed, this would just be an alternative to using them.

One situation where this could be useful is if we have a function run_stan_sampling(posterior_name) (not implemented in this library but by an user or by another library). This function is not given a pdb object, thus it will need to call pdb_default. This works well for using the default posterior database, but it won't be possible to use something else, like a local posterior database. We could modify run_stan_sampling to take pdb as an argument, but that would also require changing callers of run_stan_sampling to accept a pdb argument. However if we have pdb_from_config there would be no need for any changes and it would be possible to easily use the posterior database that the user wants to use.

MansMeg commented 4 years ago

I think this sounds like a good idea. I could implement it in R if you would create a first draft in python. I think the config should be a yml file then.

Just to understand. The idea is to have a local yml file somewhere that you point to that would point to the database you would like to use? So instead of pdb_local() you would use pdb_from_config("path_to_config")? What would be the use cases for this?

eerolinna commented 4 years ago

The idea is that you have a file .posteriordbrc (or whatever name we want) in the root of your project. You call just pdb_default(). If it finds a config file in the current directory or some parent directory it will use that, otherwise it will use the global defaults. The main use case I have in mind is what I wrote in the first post.

eerolinna commented 4 years ago

To clarify on that use case, assume that run_stan_sampling(posterior_name) is a library function from a third party package, call it posteriordb-stan-inference-methods or something. It internally calls pdb_default to construct a pdb object.

I have a project that uses a custom version of the posterior database. I have added some posteriors and I'm planning to make a PR but haven't done it yet. Somewhere in the project I have code that calls run_stan_sampling("my_new_posterior_name"). I have a .posteriordb_config.yml file in the root of my project that contains

type: "local"
path: "path_to_my_custom_pdb"

This makes it so pdb_default that is called as a part of run_stan_sampling will use my custom posterior database.

Then when I have made a PR and my_new_posterior_name has been added I want to switch back to using the main posteriordb. I just change the config file without having to edit any of the code (or alternatively just delete the config file).

eerolinna commented 4 years ago

Eventually this functionality would be used by bayesbench. You would have a file that specifies the jobs (which inference methods to run on which posteriors) and separately a posteriordb config file that specifies how to locate the posterior database that contains the posteriors used in the job config file.

I don't think I'm going to implement this for a while and I don't think there's any hurry to implement this in R either for now.

MansMeg commented 4 years ago

Sure, although I think it would be very nice to have so I would probably implement it in R to test it.

eerolinna commented 4 years ago

Sounds good!

MansMeg commented 3 years ago

I have now implemented this. If you store a file called .pdb_config.yml the default is to check and use it. See pdb_config_sample.yml for a template (currently in development branch).