protocol / prodeng

Issues, discussions and documentation from the production engineering team
2 stars 1 forks source link

Thunderdome: Self Service Experiments #19

Open iand opened 1 year ago

iand commented 1 year ago

What Is it?

Enable simple creation and execution of experiments

Deliverables

Why Are We Doing It?

We want to allow anyone to define a Thunderdome experiment and run it themselves. They shouldn't need to be experts in AWS and Terraform. We want them to be able to define the parameters of an experiment, execute it and then get access to the results in Grafana automatically.

Notes

The work to integrate Kubo release candidate experiments requires a limited API for creating one-shot experiments but this is a fuller API allowing management of the lifecycle of an experiment.

Should be able to say “test these N docker images, in an experiment called ‘foo’” and start seeing results in Grafana in < 5mins with a single cli call or an edit to a single file. Limit to AWS first

We accept here a constrained selection of resources at first (probably Fargate, so 4 core 30G RAM), but should be able to eventually have a wide of machine sizes)

Project overview is on Notion

Tasks

Now being tracked as part of probe lab: https://www.notion.so/pl-strflt/Thunderdome-Self-Service-Experiments-85dd1389e7bb4bf6a36d638b45d29d20

JesseXie commented 1 year ago

@iand the https://github.com/ipfs-shipyard/thunderdome/issues/26 in https://github.com/protocol/prodeng/issues/19 and https://github.com/protocol/prodeng/issues/18, it is on purpose?

iand commented 1 year ago

@iand the ipfs-shipyard/thunderdome#26 in #19 and #18, it is on purpose?

Nope. Still reorganising epics as we go.