risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.79k stars 560 forks source link

Discussion: make simulation tests configurable, e.g. via `yaml` file #6375

Open lmatz opened 1 year ago

lmatz commented 1 year ago

We have two deterministic simulation tests, i.e. recovery tests and scale tests, where we ingest special behaviors:

  1. node-killing
  2. reschedule

Currently,

  1. can be configured via the command line(different settings do not vary a lot though)
  2. seems to be hard-coded.

We can mimic the way how chaos mesh specifies all kinds of faults(OS level) to specify these special behaviors(from Risingwave), e.g. via yaml file, such as https://chaos-mesh.org/docs/simulate-pod-chaos-on-kubernetes/#pod-failure-example.

I imagine we can still specify some particular sequences of certain behaviors, or have some pre-determined chaos generator to randomly generate instructions.

The benefits:

  1. avoid hard-coded behaviors, more organized
  2. enable different configurations more easily, e.g. run different settings during the per-pr test, the main-cron daily test, and the pre-release testing

Although we can probably mimic node-killing in chaos mesh by process crash (not for reschedule because this requires instructions from RW), chaos mesh runs things in the real world while simulation tests run in the simulated world.

So just chaos mesh is not enough.

Occurred to me when thinking #6369.

lmatz commented 1 year ago

If enough people agree with this approach, I will investigate what syntax we can learn from chaos mesh and draft a specification.

Not necessarily yaml though

liurenjie1024 commented 1 year ago

LGTM

wangrunji0408 commented 1 year ago

LGTM. I'm trying to merge the recovery test and scale test into one crate and then provide a unified configuration.

lmatz commented 1 year ago

Related: #6485

fuyufjh commented 1 year ago

Hey, any updates

wangrunji0408 commented 1 year ago

No update yet 🥵