kumahq / kuma-website

🐻 The official website for Kuma, the control plane for modern service connectivity.
https://kuma.io
Apache License 2.0
137 stars 94 forks source link

Document flagger with GatewayAPI #1946

Open slonka opened 4 months ago

slonka commented 4 months ago

Look here for explanation

What happened?

I think there are three problems with it:

  1. This does not work https://github.com/fluxcd/flagger/blob/133fdecf56b2983f69cf06cbdeb372f988d89343/docs/gitbook/tutorials/kuma-progressive-delivery.md?plain=1#L34 it should be kubectl label namespace test kuma.io/sidecar-injection=enabled otherwise pods to not have sidecars
  2. Default traffic permission is missing
apiVersion: kuma.io/v1alpha1
kind: TrafficPermission
mesh: default
metadata:
  name: allow-all-traffic
spec:
  sources:
    - match:
        kuma.io/service: '*'
  destinations:
    - match:
        kuma.io/service: '*'

thats why load tester can't access the podinfo service

  1. Flagger integration should be using MeshHTTPRoute

FYI @aryan9600

aryan9600 commented 3 months ago

steps 1 and 2 were addressed and released in https://github.com/fluxcd/flagger/releases/tag/v1.38.0

do you think we should migrate the integration to use MeshHTTPRoute or just instruct users to use Gateway API instead?

slonka commented 3 months ago

I think we agreed on the triage that it should use MeshHTTPRoute. Let me put it back to needs-information and we'll get back to you on this on Monday.

slonka commented 3 months ago

triage: let's deprecate the existing one and use Gateway API

slonka commented 3 months ago

triage: let's figure what works and what doesn't with Gateway API

slonka commented 1 month ago

Everything seems to be working correctly, need to document it probably.

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 60
  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    # service port number
    port: 9898
    # container port number or name (optional)
    targetPort: 9898
    # Gateway API HTTPRoute host names
    hosts:
     - www.example.com
    # Reference to the Gateway that the generated HTTPRoute would attach to.
    gatewayRefs:
      - name: kong
        namespace: default
  analysis:
    # schedule interval (default 60s)
    interval: 15s
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 10
    metrics:
    - name: error-rate
      # max error rate (5xx responses)
      # percentage (0-100)
      templateRef:
        name: error-rate
        namespace: flagger-system
      thresholdRange:
        max: 1
      interval: 10s
    - name: latency
      templateRef:
        name: latency
        namespace: flagger-system
      # seconds
      thresholdRange:
         max: 1000
      interval: 10s
    # testing (optional)
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          cmd: "hey -z 2m -q 10 -c 2 -host www.example.com http://kong.default/"apiVersion: gateway.networking.k8s.io/v1
---
kind: Gateway
metadata:
  name: kuma
  namespace: default
spec:
  gatewayClassName: kuma
  listeners:
  - allowedRoutes:
      namespaces:
        from: All
    name: proxy
    hostname: "*.example.com"
    port: 80
    protocol: HTTPapiVersion: flagger.app/v1beta1
---
kind: MetricTemplate
metadata:
  name: latency
  namespace: flagger-system
spec:
  provider:
    type: prometheus
    address: http://prometheus-server.mesh-observability:80
  query: |
    histogram_quantile(0.50,
      sum(
        rate(
          envoy_cluster_upstream_rq_time_bucket{
            app=~"{{ target }}",
            k8s_kuma_io_namespace=~"{{ namespace }}"
          }[{{ interval }}]
        )
      ) by (le)
    )
---
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: error-rate
  namespace: flagger-system
spec:
  provider:
    type: prometheus
    address: http://prometheus-server.mesh-observability:80
  query: |
    sum(
        rate(
            envoy_cluster_upstream_rq{
              k8s_kuma_io_namespace="{{ namespace }}",
              app=~"{{ target }}",
              envoy_response_code!~"5.*"
            }[{{ interval }}]
        )
    )
    /
    sum(
        rate(
            envoy_cluster_upstream_rq{
              k8s_kuma_io_namespace="{{ namespace }}",
              app=~"{{ target }}"
            }[{{ interval }}]
        )
    )
jakubdyszkiewicz commented 3 weeks ago

Triage: document as a guide