kumahq / kuma

🐻 The multi-zone service mesh for containers, Kubernetes and VMs. Built with Envoy. CNCF Sandbox Project.
https://kuma.io/install
Apache License 2.0
3.65k stars 334 forks source link

Introduce `HealthyPanicThreshold` for MeshCircuitBreaker policy #11704

Open Icarus9913 opened 3 weeks ago

Icarus9913 commented 3 weeks ago

Description

I noticed that there's an implementation difference between our MeshCircuitBreaker and Istio's (https://github.com/istio/istio/blob/7b2d8b9dacb7989d75b3b3d860988cadb9f5a848/pilot/pkg/networking/core/cluster_traffic_policy.go#L416-L423), maybe we could add one flag to disable panic threshold and it depends on the user's choice.

lahabana commented 3 weeks ago

Would that be something like:

https://kuma.io/docs/dev/policies/meshcircuitbreaker/

      outlierDetection:
        panicDisabled: true
        interval: 5s
        baseEjectionTime: 30s
        maxEjectionPercent: 20
        splitExternalAndLocalErrors: true

?

Also from looking at the docs I see no mention of panic, we should talk about it (https://github.com/kumahq/kuma-website/issues/1953).

Is there any possible workaround to set configuration in a way that will avoid entering panic?

Icarus9913 commented 2 weeks ago

Is there any possible workaround to set configuration in a way that will avoid entering panic?

During load balancing, Envoy will generally only consider available (healthy or degraded) hosts in an upstream cluster. However, if the percentage of available hosts in the cluster becomes too low, Envoy will disregard health status and balance either amongst all hosts or no hosts.

If you want to avoid entering panic mode, which means the upstream clusters keep in the healthy state and do not turn into unhealth. I guess maybe we just need to disable the Outlier detection ? And the upstream cluster would keep in health even they returned lots of failures.

Icarus9913 commented 1 week ago

For now, I can just only use the MeshProxyPatch to disable the enovy-panic-threshold

apiVersion: kuma.io/v1alpha1
kind: MeshProxyPatch
metadata:
  name: circuit-breaker
  namespace: kuma-system
spec:
  default:
    appendModifications:
    - cluster:
        jsonPatches:
        - op: add
          path: /common_lb_config
          value:
            healthy_panic_threshold: {}
        match:
          name: httpbin-1_kuma-demo_svc_8000
        operation: Patch
  targetRef:
    kind: Mesh
jakubdyszkiewicz commented 2 days ago

Do we add envoy panic threshold by default even when you don't use MeshCircuitBreaker? @Icarus9913

lobkovilya commented 2 days ago

I think we have it in MeshHealthCheck policy https://github.com/kumahq/kuma/blob/4d9ec2b5f795006a3e271fbf9a5113ebf0706f46/pkg/plugins/policies/meshhealthcheck/api/v1alpha1/meshhealthcheck.go#L61

Icarus9913 commented 2 days ago

@jakubdyszkiewicz I believe it's enabled by default in LoadBalancing of Envoy. And I was wondering if we could implement something like what Ilya put above.

https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/panic_threshold

Icarus9913 commented 18 hours ago

Maybe it's more proper to add a property HealthyPanicThresholdPercent *intstr.IntOrString like we did for the MeshHealthCheck policy rather than just use a bool type to enable/disable it, which lifts the restriction for the user to edit this property and write some explanation for the panic threshold. (If the user doesn't set it, the envoy would treat it with the default value 50%)


We should also check the Istio's CircuitBreaker policy properties to check the differences and its source codes implementations.