solo-io / gloo

The Feature-rich, Kubernetes-native, Next-Generation API Gateway Built on Envoy
https://docs.solo.io/
Apache License 2.0
4.07k stars 434 forks source link

Chaos Testing #9638

Open davidjumani opened 2 months ago

davidjumani commented 2 months ago

Introduce chaos testing as a way to test the stability and resiliency of Edge.

For example, Edge has an external dependency on the Kubernetes control plane. In larger environments, we need to be able to handle the "natural" chaos to the Kubernetes control plane (e.g. apiserver unavailability / load). Similarly, we may see periodic node pressure, Pod churn, etc. that cause our internal component (e.g. redis) Pods to be frequently recreated. In both of those scenarios, we need to be able to ensure our system can handle disruptions without affecting dataplane integrity, and risking costly outages for our customers.

This can follow the similar patter followed by platform

davidjumani commented 2 months ago

Testing the kube api server unavailability has been added in https://github.com/solo-io/gloo/pull/9563