openziti / helm-charts

various helm charts for openziti-test-kitchen projects
https://openziti.io/helm-charts/
Apache License 2.0
7 stars 9 forks source link

add health checks to controller and router charts #107

Open qrkourier opened 1 year ago

qrkourier commented 1 year ago

K8s will use each type of probe if we make them available. A successful probe means:

  1. startup: the app is finished starting up, go ahead and try the readiness probe
  2. readiness: the app is ready to receive incoming requests, tell the LB to begin forwarding to the pod
  3. liveness: the app is healthy and still has capacity for more requests, keep sending and don't restart me right now

Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

qrkourier commented 8 months ago

There is a router health check configuration (link to reference) that can be added to the router config that we use currently to verify the health of a router, e.g., whether it's connected to the controller ctrlPingCheck.

It doesn't specifically show it's "ready to receive traffic", because that could mean many things, since it could be connected to certain links while others are having issues/down. This is just a general check to ensure the router receives updates from the controller.

The most recent addition to the health check is linkCheck, so you can verify if the router has a link or even a specific link to another router, indicating it's ready to transport traffic.

To add the health check to the ER, you need to add something like this to the ER config:

healthChecks:
  ctrlPingCheck:
    interval: 30s
    timeout: 15s
    initialDelay: 15s
  linkCheck:
    minLinks: 1
    interval: 5s
    initialDelay: 5s

add a web section to the ER like this:

web:
  - name: health-check
    bindPoints:
      - interface: 0.0.0.0:8081
        address: 0.0.0.0:8081
    apis:
      - binding: health-checks

This combination would allow you to GET https://localhost:8081/health-checks

The above would produce something like this as an output:

{
    "data": {
        "checks": [
            {
                "details": null,
                "healthy": true,
                "id": "controllerPing",
                "lastCheckDuration": "4.344µs",
                "lastCheckTime": "2024-02-13T18:40:01Z"
            },
            {
                "details": [
                    {
                        "linkId": "lndLOpwd7yOcSXtCcPwWf",
                        "destRouterId": "j.LOxzd9A",
                        "latency": 3271785.96875,
                        "addresses": {
                            "ack": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:34.199.168.165:61031"
                            },
                            "payload": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:34.199.168.165:65235"
                            }
                        }
                    },
                    {
                        "linkId": "3W72EY2a0inbyCHIdYk6Gd",
                        "destRouterId": "f9fs.nvej",
                        "latency": 84934025.1015625,
                        "addresses": {
                            "ack": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:35.181.192.76:42764"
                            },
                            "payload": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:35.181.192.76:42750"
                            }
                        }
                    },
                    {
                        "linkId": "1yp3sDwqj6CHui4Zmt89wB",
                        "destRouterId": "PKud5nLtj",
                        "latency": 188746151.2265625,
                        "addresses": {
                            "ack": {
                                "localAddr": "tcp:10.19.116.60:58616",
                                "remoteAddr": "tcp:52.66.46.9:443"
                            },
                            "payload": {
                                "localAddr": "tcp:10.19.116.60:52392",
                                "remoteAddr": "tcp:52.66.46.9:443"
                            }
                        }
                    },
                    {
                        "linkId": "5wjStaisGHkT5Xu0fQrfEq",
                        "destRouterId": "bnq85xLt3",
                        "latency": 201215039.2109375,
                        "addresses": {
                            "ack": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:18.61.94.28:48878"
                            },
                            "payload": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:18.61.94.28:48864"
                            }
                        }
                    },
                    {
                        "linkId": "4e0RACw8dsguV43BrAsQfc",
                        "destRouterId": "u6Q1QSPulm",
                        "latency": 3872621.265625,
                        "addresses": {
                            "ack": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:132.145.157.243:48868"
                            },
                            "payload": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:132.145.157.243:48864"
                            }
                        }
                    },
                    {
                        "linkId": "6AmAWcfB50a2OzFvlsr1vn",
                        "destRouterId": "7fTQPzdt7d",
                        "latency": 3430749.8671875,
                        "addresses": {
                            "ack": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:3.217.193.94:50371"
                            },
                            "payload": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:3.217.193.94:25962"
                            }
                        }
                    },
                    {
                        "linkId": "1CnhJJ73e1AiKjoVoRy3tt",
                        "destRouterId": "oWeCqGOcJ",
                        "latency": 2906363.9140625,
                        "addresses": {
                            "ack": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:52.54.127.95:56812"
                            },
                            "payload": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:52.54.127.95:56796"
                            }
                        }
                    },
                    {
                        "linkId": "3vfjSpoNfoYbyqIBbT7ZKx",
                        "destRouterId": "R7nKHgLtj",
                        "latency": 66181343.484375,
                        "addresses": {
                            "ack": {
                                "localAddr": "tcp:10.19.116.60:45646",
                                "remoteAddr": "tcp:44.225.183.166:443"
                            },
                            "payload": {
                                "localAddr": "tcp:10.19.116.60:45640",
                                "remoteAddr": "tcp:44.225.183.166:443"
                            }
                        }
                    },
                    {
                        "linkId": "2MaMaVsqFNtvhRejCpyR7y",
                        "destRouterId": "s3FjWqdlS",
                        "latency": 70628346.7578125,
                        "addresses": {
                            "ack": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:54.77.98.202:57722"
                            },
                            "payload": {
                                "localAddr": "tcp:10.19.116.60:443",
                                "remoteAddr": "tcp:54.77.98.202:57720"
                            }
                        }
                    },
                    {
                        "linkId": "2EVj6GaGBr1KEFXSeypc3i",
                        "destRouterId": "joI2Wqdlb",
                        "latency": 187742628.4765625,
                        "addresses": {
                            "ack": {
                                "localAddr": "tcp:10.19.116.60:38150",
                                "remoteAddr": "tcp:15.207.241.220:443"
                            },
                            "payload": {
                                "localAddr": "tcp:10.19.116.60:38136",
                                "remoteAddr": "tcp:15.207.241.220:443"
                            }
                        }
                    }
                ],
                "healthy": true,
                "id": "link.health",
                "lastCheckDuration": "120.997µs",
                "lastCheckTime": "2024-02-13T18:40:06Z"
            }
        ],
        "healthy": true
    },
    "meta": {}
}
qrkourier commented 8 months ago

Link to controller health-checks reference: https://openziti.io/docs/reference/configuration/controller#healthchecks