Run the following script after navigating to the src folder.
python main.py --config ../configs/basic_experiment.yaml
Usage:
Create a yaml file to configure the experiment
Use example.yaml as a reference for what is needed.
Keep the path section as is.
experiment is a mandatory key
name - name of the file, the main.py function will not copy the name of the yaml file to use as the experiment folder name
description - optional experiment description (legacy feature for summarize.py, which is not working at the moment)
parameters is a mandatory key
training population requires
All 3 require keys x_population, y_population
x_population requires:
y_population requires:
true_value - optional, if set to null or excluded, will sample the gold_population 100k times to estimate the true parameter (currently only supports estimate type: mean)
n_its - number of iteration experiment
test_size - optional, test split size, default - 0.2
use_active_inference - I have no idea how that got there or what it does, probably can delete
confidence_level - optional, intended confidnece level (not alpha value!!) default - 0.95
cut_interval - optional, if True, will cut off all negative values of confidence interval, default - False
ind_var - independent variables that will be altered throughout experiment
Example usage:
ind_var:
name:
- mean
- std
vals:
- mean: 0
std: 4
- mean: 2
std: 4
- mean: -2
std: 4
- mean: 4
std: 4
- mean: -4
std: 4
- mean: 0
std: 5
- mean: 0
std: 6
paths:
mean:
- experiment.parameters.gold_population.x_population.mean
- experiment.parameters.unlabelled_population.x_population.mean
- experiment.parameters.gold_population.y_population.mean
- experiment.parameters.unlabelled_population.y_population.mean
std:
- experiment.parameters.gold_population.x_population.std
- experiment.parameters.unlabelled_population.x_population.std
- experiment.parameters.gold_population.y_population.std
- experiment.parameters.unlabelled_population.y_population.std
model - dictionary of settings of model to be trained
model_bias - if True, will calculate model bias
estimate - type of estimate being estimate, currently only supports mean
methods - list of methods of constructing confidence intervals that will be tested
metrics - list of metrics to be computed (keep widths, coverages almost always)
distances - optional, distance between distributions metric that will be computed, only used for covariate shift experiments
plot_distributions - optional, if True, will plot the X distributions to be tested
clipping - optional, if True, will remove all unlabelled points that are outside of the training distribution
remove_gold - optional, will also remove gold values outside of training distribution
varying_true_value - optional, if True, will recompute true_value every new independent variable
train_once - optional, if True, trains a model once per independent variable instead per experiment iteration
Too many plot types to specify, use example.yaml as a reference. The way plotting is run is that first, main.py will run the experiment and create the results.csv, and a pd_dataframe, which will be sent to plotting functions, each plot under plotting[plots] creates a new plot, these plots have their own config, which is relatively straightforward. The only important thing to note is that x is the key of the pd_dataframe that you will want to use as plotting x variable (you do not have to in general worry about duplicates, plotting.py calls col.uniques())
If you have a bunch of data, and want to just rerun the plotting, run plot_only.py, there are two options