lyft / clutch

Extensible platform for infrastructure management
https://clutch.sh
Apache License 2.0
1.7k stars 117 forks source link

experimentation: enforce uniqueness of experiment configurations #486

Open Augustyniak opened 4 years ago

Augustyniak commented 4 years ago

Description

experimentation package currently uses 2 data tables internally: experiment_run and experiment_config.

experiment_config (
    id BIGINT,
    details JSONB
)
experiment_run (
    id BIGINT,
    experiment_config_id BIGINT,
    execution_time TSTZRANGE,
    cancelation_time TIMESTAMP WITH TIME ZONE,
    creation_time TIMESTAMP WITH TIME ZONE,
    termination_reason varchar(32)
)

There is many (run) to 1 (config) relationship between runs and configs:

Now, as experiment config is almost a blackbox from a perspective of an experimentation framework, the framework currently does NOT enforce a uniqueness of experiments' configurations.

There are two ways for us to fix it so that we force people to reuse existing configurations instead of creating the exact copies of existing ones:

  1. Add a name column to experiment_config table and an index to a database that would enforce a uniqueness of the name of an experiment configuration.
  2. Enforce a uniqueness of details column of experiment_config table with the use of a database index.

2) is simpler to implement BUT it may not be enough for us. Let's say that somebody - a package that relies on experimentation package - wants to store configuration whose uniqueness depends on the value of only some of the fields of experiment configuration (instead of the look of the value of details column of a given experiment configuration). In this case approach 2) is not what we want. Approach 1) would allow a package storing information in experiment_config table to define what value should be used for comparison of configurations.

Complexity: S

Augustyniak commented 4 years ago

At Lyft, an example experiment configuration looks something like this:

    variable_name varchar(100)
    experiment_name varchar(100)

This whole configuration ^ is stored inside of a details column of experiment_config table.

Due to some of the internal assumptions of Lyft systems we don't want to allow for 2 experiment configurations with the same value of variable_name field. Now, referring to the original message from this issue an uniqueness of the value of details column is not what we want since it would allow a user of Clutch to add the 2 following entries to the database:

    variable_name = "1"
    experiment_name = "2"

and

    variable_name = "1"
    experiment_name = "3"

Both of them ^ have the same value of variable_name property which - as was mentioned above - is something that we want to be able to prevent from happening.