Discussion: make simulation tests configurable, e.g. via `yaml` file

lmatz commented 1 year ago

We have two deterministic simulation tests, i.e. recovery tests and scale tests, where we ingest special behaviors:

node-killing
reschedule

Currently,

can be configured via the command line(different settings do not vary a lot though)
seems to be hard-coded.

We can mimic the way how chaos mesh specifies all kinds of faults(OS level) to specify these special behaviors(from Risingwave), e.g. via yaml file, such as https://chaos-mesh.org/docs/simulate-pod-chaos-on-kubernetes/#pod-failure-example.

I imagine we can still specify some particular sequences of certain behaviors, or have some pre-determined chaos generator to randomly generate instructions.

The benefits:

avoid hard-coded behaviors, more organized
enable different configurations more easily, e.g. run different settings during the per-pr test, the main-cron daily test, and the pre-release testing

Although we can probably mimic node-killing in chaos mesh by process crash (not for reschedule because this requires instructions from RW), chaos mesh runs things in the real world while simulation tests run in the simulated world.

So just chaos mesh is not enough.

Occurred to me when thinking #6369.

lmatz commented 1 year ago

If enough people agree with this approach, I will investigate what syntax we can learn from chaos mesh and draft a specification.

Not necessarily yaml though

liurenjie1024 commented 1 year ago

LGTM

wangrunji0408 commented 1 year ago

LGTM. I'm trying to merge the recovery test and scale test into one crate and then provide a unified configuration.

lmatz commented 1 year ago

Related: #6485

fuyufjh commented 1 year ago

Hey, any updates

wangrunji0408 commented 1 year ago

No update yet 🥵

risingwavelabs / risingwave

Discussion: make simulation tests configurable, e.g. via `yaml` file #6375