kubernetes / ingress-gce

Ingress controller for Google Cloud
Apache License 2.0
1.27k stars 298 forks source link

Ingress Healthcheck Configuration #42

Closed bowei closed 2 years ago

bowei commented 6 years ago

From @freehan on May 15, 2017 21:25

On GCE, ingress controller sets up default healthcheck for backends. The healthcheck will point to the nodeport of backend services on every node. Currently, there is no way to describe detail configuration of healthcheck in ingress. On the other side, each application may want to handle healthcheck differently. To bypass this limitation, on Ingress creation, ingress controller will scan all backend pods and pick the first ReadinessProbe it encounters and configure healthcheck accordingly. However, healthcheck will not be updated if ReadinessProbe was updated. (Refer: https://github.com/kubernetes/ingress/issues/582)

I see 3 options going forward with healthcheck

1) Expand the Ingress or Service spec to include more configuration for healthcheck. It should include the capabilities provided by major cloud providers, GCP, AWS...

2) Keep using readiness probe for healthcheck configuration, a) Keep today's behavior and communicate clearly regarding the expectation. However, this still breaks the abstraction and declarative nature of k8s. b) Let ingress controller watch the backend pods for any updates for ReadinessProbe. This seems expensive and complicated.

3) Only setup default healthcheck for ingresses. Ingress controller will only ensure the healthcheck exist periodically, but do not care about its detail configuration. User can configure it directly thru the cloud provider.

I am in favor of option 3). There are always more bells and whistles on different cloud providers. The higher layer we go, the more features we can utilize. For L7 LB, there is no clean simple way to describe every intention. So is the case for health check. To ensure a smooth experience, k8s still sets up the basics. For advance use cases, user will have to configure it thru the cloud provider.

Thoughts? @kubernetes/sig-network-misc

Copied from original issue: kubernetes/ingress-nginx#720

naseemkullah commented 4 years ago

Hi everyone, please take a look at #1029 which implements the healthchecking overrides on the backendconfig for a service.

Thanks @bowei ! But are GCE-Ingress health check overrides available today ? Not seeing any related docs in https://cloud.google.com/kubernetes-engine/docs/concepts/backendconfig

How to keep track if it is available in a given version of GKE (e.g. stable channel) ?

dustinmoris commented 4 years ago

Can someone post here what the recommended fix is? What does a BackendConfig with a health check look like? What is the min Kubernetes version where this feature is supported?

bowei commented 4 years ago

the docs will be updated very soon -- @spencerhance

jbg commented 4 years ago

An example BackendConfig using this feature:

apiVersion: cloud.google.com/v1
kind: BackendConfig
  name: my-backend-config
    checkIntervalSec: 20
    timeoutSec: 1
    healthyThreshold: 1
    unhealthyThreshold: 3
    type: TCP
    # defaults to serving port
    # port:
    # only for HTTP/HTTPS type
    # path:

ref: https://godoc.org/k8s.io/ingress-gce/pkg/apis/backendconfig/v1#HealthCheckConfig

It doesn't appear to work though (health check is still created as type HTTP and path "/" regardless of what I configure in the BackendConfig). Is there any GKE release that supports this yet?

mofirouz commented 4 years ago

I've tried the following setup with no success sadly:

My app (nakama) exposes two ports - 7110 for gRPC (HTTP/2) and 7111 for gRPC-gateway (HTTP 1.1 with / used for LB healthchecks).

This is running on a GKE instance version 1.16.8-gke.10.

kind: Service
apiVersion: v1
  name: nakama3
  namespace: heroiclabs
    project: heroiclabs
    cloud.google.com/app-protocols: '{"nakama-api":"HTTP","nakama-grpc-api":"HTTP2"}'
    beta.cloud.google.com/backend-config: >-
      {"ports":{"nakama-api":"backendconfig","nakama-grpc-api":"backendconfig"}, "default": "backendconfig"}
    cloud.google.com/neg: '{"ingress": true}'
    - name: nakama-grpc-api
      protocol: TCP
      port: 7110
      targetPort: 7110
    - name: nakama-api
      protocol: TCP
      port: 7111
      targetPort: 7111
    app: nakama
  type: NodePort


kind: BackendConfig
apiVersion: cloud.google.com/v1
    project: heroiclabs
  name: backendconfig
  namespace: heroiclabs
    drainingTimeoutSec: 5
    sampleRate: 0.0
  timeoutSec: 86400
    port: 7111
    checkIntervalSec: 10


kind: Ingress
apiVersion: extensions/v1beta1
  name: gundam3
  namespace: heroiclabs
    project: heroiclabs
    ingress.gcp.kubernetes.io/pre-shared-cert: heroiclabs
    kubernetes.io/ingress.allow-http: 'false'
    serviceName: gundam3
    servicePort: 7110

The result is the following:

lb1 lb2

Apologies for tagging you individually guys, but @bowei or @rramkumar1 can you shed some light on what might be going wrong here?

bowei commented 4 years ago

cc: @spencerhance

spencerhance commented 4 years ago

This feature has not rollout out yet, but will be available in the next release (1.17.6-gke.7) next week, which the exception of port configuration. That will require a bug fix that should roll out a few weeks after. Additionally, this feature won't be available in 1.16 clusters until about a month after it has been released in 1.17.

mofirouz commented 4 years ago

Thanks @spencerhance and @bowei for the prompt response - I'll keep a look out for that.

ahmetgeymen commented 4 years ago
    path: /actuator/health
    port: 9090
  initialDelaySeconds: 5

I can manage to have healthy ingress for around 10 minutes by giving custom health check (with node port assigned with readinessProbe port) for automatically created backend service config for load balancer which created with ingress. After about 10 minutes health check returns to look for default node port and then I get 502 again for external ip.

(Using screenshot of @mofirouz, but the same config field I mentioned above.)



Unfortunately they warn about not to update load balancer config manually. But without doing anything manually, I can not reach my service from assigned ip of the load balancer. By the way, I didn't try updating the actuator's health endpoint to / yet.


As mentioned here I tried to give exactly same port with the container port for readinessProbe

- containerPort: 8080
  name: actuator
    path: /actuator/health/readiness
    port: actuator

Then creating Ingress again provides health check config of the load balancer to look for the new path instead /. It is important to use same port for the readinessProbe with container port to keep healthy ingress. May be incoming healthcheck feature of ingress would notice additional port config.

bowei commented 4 years ago

@ahmetgeymen -- hopefully after the healthcheck feature is available, the need to edit the healthcheck settings manually will go away. Let us know if there is anything that remains that makes custom configuration necessary..

Gatsby-Lee commented 4 years ago

This seems available on beta. I am on 1.17.6-gke.11.

I don't need change in Ingress.

apiVersion: cloud.google.com/v1
kind: BackendConfig
  name: config-default
    checkIntervalSec: 10
    timeoutSec: 3
    requestPath: /healthz
apiVersion: v1
kind: Service
  name: test-healthz
    cloud.google.com/neg: '{"ingress": true}'
    beta.cloud.google.com/backend-config: '{"default": "config-default"}'
    - port: 8080
      protocol: TCP
      targetPort: 8080
    app: test-healthz
  type: NodePort
gnarea commented 4 years ago

I'm using v1.17.6-gke.7 but can't get this to work with gRPC. I basically want to use TCP (not HTTP2) health checks because those HTTP2 health checks don't work at all with gRPC.

Here's the resources I have:

$ kubectl get svc gw-test-relaynet-internet-gateway-cogrpc -o yaml
apiVersion: v1
kind: Service
    beta.cloud.google.com/backend-config: '{"ports":{"grpc":"cogrpc"}, "default":
    cloud.google.com/app-protocols: '{"grpc":"HTTP2"}'
    cloud.google.com/neg: '{"ingress": true}'
    cloud.google.com/neg-status: '{"network_endpoint_groups":{"8081":"k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572"},"zones":["europe-west2-a"]}'
    meta.helm.sh/release-name: gw-test
    meta.helm.sh/release-namespace: default
    service.alpha.kubernetes.io/app-protocols: '{"grpc":"HTTP2"}'
  creationTimestamp: "2020-06-29T14:21:27Z"
    app.kubernetes.io/instance: gw-test
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
    app.kubernetes.io/version: 1.3.9
    helm.sh/chart: relaynet-internet-gateway-0.1.0
  name: gw-test-relaynet-internet-gateway-cogrpc
  namespace: default
  resourceVersion: "606508"
  selfLink: /api/v1/namespaces/default/services/gw-test-relaynet-internet-gateway-cogrpc
  uid: 9e28d795-f1ee-49ab-a307-1424b016d46a
  externalTrafficPolicy: Cluster
  - name: grpc
    nodePort: 32112
    port: 8081
    protocol: TCP
    targetPort: grpc
    app.kubernetes.io/instance: gw-test
    app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
  sessionAffinity: None
  type: NodePort

$ kubectl get backendconfig cogrpc -o yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
  creationTimestamp: "2020-06-30T16:50:36Z"
  generation: 2
    project: public-gw
  name: cogrpc
  namespace: default
  resourceVersion: "608964"
  selfLink: /apis/cloud.google.com/v1/namespaces/default/backendconfigs/cogrpc
  uid: 9387cc5d-6710-4c7e-99ed-dc78f124da5f
    checkIntervalSec: 20
    healthyThreshold: 1
    port: 8081
    timeoutSec: 1
    type: TCP
    unhealthyThreshold: 3

The old HTTP2 healthcheck is still used. In fact, I can't see the healthcheck that should've been created by the BackendConfig above (which, according to kubectl, is in the right cluster and namespace) -- I can see it in k8s but not GCP (I'm sure the project label is set to the right value).

Any idea what I'm doing wrong?

Gatsby-Lee commented 4 years ago

@gnarea I don't know much about grpc maybe you can get some idea from here - https://cloud.google.com/compute/docs/reference/rest/v1/healthChecks

gnarea commented 4 years ago

Thanks @Gatsby-Lee! I think that link helps with the structure of the data of the .spec, but I don't think that's the problem here. According to kubectl describe backendconfig cogrpc, the values I specified seem to have been accepted as valid. The problem is that the LB isn't using my custom health check -- And in fact, I can't find that healthcheck on GCP, but I can still see my BackendConfig resource in the cluster, which I guess is why the LB can't use it.

I also gave up on the TCP probe because I couldn't get it to work and, even if I eventually did, it'd be far too unreliable for a health check. Instead, per the suggestion of a GCP support agent, I've now created a new container in the pod, which is an HTTP app with a single endpoint that in turn pings my gRPC service using the gRPC health check protocol. This approach is basically Option 3 in https://kubernetes.io/blog/2018/10/01/health-checking-grpc-servers-on-kubernetes/ (except that I'm doing HTTP probes instead of exec probes to make the GCP LB healthchecks work)

To sum up, I want to configure the gRPC backend in the LB in such a way that the health check points to the HTTP proxy containers but the actual backend uses the gRPC containers. This is the kind of things you'd be able to do with the fix in this issue, right? If so, how can I configure the BackendConfig, Service and potentially Ingress to achieve this?

Here's my current service, backendconfig and deployment in case that's useful:

$ kubectl get svc gw-test-relaynet-internet-gateway-cogrpc -o yaml
apiVersion: v1
kind: Service
    cloud.google.com/app-protocols: '{"grpc":"HTTP2"}'
    cloud.google.com/neg: '{"ingress": true}'
    cloud.google.com/neg-status: '{"network_endpoint_groups":{"8081":"k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572"},"zones":["europe-west2-a"]}'
    meta.helm.sh/release-name: gw-test
    meta.helm.sh/release-namespace: default
    service.alpha.kubernetes.io/app-protocols: '{"grpc":"HTTP2"}'
  creationTimestamp: "2020-06-29T14:21:27Z"
    app.kubernetes.io/instance: gw-test
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
    app.kubernetes.io/version: 1.3.10
    helm.sh/chart: relaynet-internet-gateway-0.1.0
  name: gw-test-relaynet-internet-gateway-cogrpc
  namespace: default
  resourceVersion: "989264"
  selfLink: /api/v1/namespaces/default/services/gw-test-relaynet-internet-gateway-cogrpc
  uid: 9e28d795-f1ee-49ab-a307-1424b016d46a
  externalTrafficPolicy: Cluster
  - name: grpc
    nodePort: 32112
    port: 8081
    protocol: TCP
    targetPort: grpc
  - name: health-check
    nodePort: 32758
    port: 8082
    protocol: TCP
    targetPort: health-check
    app.kubernetes.io/instance: gw-test
    app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
  sessionAffinity: None
  type: NodePort
  loadBalancer: {}

$ kubectl get backendconfig cogrpc -o yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
  creationTimestamp: "2020-07-01T10:41:21Z"
  generation: 1
    project: public-gw
  name: cogrpc
  namespace: default
  resourceVersion: "1026811"
  selfLink: /apis/cloud.google.com/v1/namespaces/default/backendconfigs/cogrpc
  uid: 57c6219f-91ee-4e45-9576-a817c958dc3c
    checkIntervalSec: 20
    healthyThreshold: 1
    port: 8082
    timeoutSec: 1
    type: HTTP
    unhealthyThreshold: 3

$ kubectl get deploy gw-test-relaynet-internet-gateway-cogrpc -o yaml
apiVersion: apps/v1
kind: Deployment
    deployment.kubernetes.io/revision: "15"
    meta.helm.sh/release-name: gw-test
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2020-06-29T14:21:27Z"
  generation: 15
    app.kubernetes.io/instance: gw-test
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
    app.kubernetes.io/version: 1.3.10
    helm.sh/chart: relaynet-internet-gateway-0.1.0
  name: gw-test-relaynet-internet-gateway-cogrpc
  namespace: default
  resourceVersion: "1012358"
  selfLink: /apis/apps/v1/namespaces/default/deployments/gw-test-relaynet-internet-gateway-cogrpc
  uid: f8c67784-e2ed-4f56-89cc-7763f1adf059
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
      app.kubernetes.io/instance: gw-test
      app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
      creationTimestamp: null
        app.kubernetes.io/instance: gw-test
        app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
      - command:
        - node
        - build/main/bin/cogrpc-server.js
        - name: COGRPC_ADDRESS
          value: https://cogrpc-test.relaycorp.tech
        - configMapRef:
            name: gw-test-relaynet-internet-gateway
        - configMapRef:
            name: gw-test-relaynet-internet-gateway-cogrpc
        - secretRef:
            name: gw-test-relaynet-internet-gateway
        image: quay.io/relaycorp/relaynet-internet-gateway:v1.3.10
        imagePullPolicy: IfNotPresent
        name: cogrpc
        - containerPort: 8080
          name: grpc
          protocol: TCP
        resources: {}
        securityContext: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - command:
        - /bin/grpc_health_proxy
        - -http-listen-addr
        - -grpcaddr
        - -service-name
        - CargoRelay
        - -v
        - "10"
        image: salrashid123/grpc_health_proxy:1.0
        imagePullPolicy: IfNotPresent
          failureThreshold: 3
            path: /
            port: health-check
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        name: cogrpc-health-check
        - containerPort: 8081
          name: health-check
          protocol: TCP
          failureThreshold: 3
            path: /
            port: health-check
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: gw-test-relaynet-internet-gateway-cogrpc
      serviceAccountName: gw-test-relaynet-internet-gateway-cogrpc
      shareProcessNamespace: true
      terminationGracePeriodSeconds: 30
  availableReplicas: 1
  - lastTransitionTime: "2020-07-01T09:48:12Z"
    lastUpdateTime: "2020-07-01T09:48:12Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2020-07-01T09:46:21Z"
    lastUpdateTime: "2020-07-01T10:04:47Z"
    message: ReplicaSet "gw-test-relaynet-internet-gateway-cogrpc-5c4f6d9cf6" has
      successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 15
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

All pods are running properly as you can see in the deployment status.

And as you'll see below, the healtcheck for this backend is connecting to the gRPC service (port: 32112) over HTTP2 instead of the HTTP proxy (node port 32578 according to kubectl describe svc gw-test-relaynet-internet-gateway-cogrpc) over HTTP:

$ gcloud compute backend-services describe --global --project public-gw k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572 
affinityCookieTtlSec: 0
- balancingMode: RATE
  capacityScaler: 1.0
  group: https://www.googleapis.com/compute/v1/projects/public-gw/zones/europe-west2-a/networkEndpointGroups/k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
  maxRatePerEndpoint: 1.0
  drainingTimeoutSec: 0
creationTimestamp: '2020-06-29T08:43:45.264-07:00'
description: '{"kubernetes.io/service-name":"default/gw-test-relaynet-internet-gateway-cogrpc","kubernetes.io/service-port":"grpc","x-features":["HTTP2","NEG"]}'
enableCDN: false
fingerprint: tVJFuu8kHHM=
- https://www.googleapis.com/compute/v1/projects/public-gw/global/healthChecks/k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
id: '9026318342467072734'
kind: compute#backendService
loadBalancingScheme: EXTERNAL
  enable: true
name: k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
port: 32112
portName: port32112
protocol: HTTP2
selfLink: https://www.googleapis.com/compute/v1/projects/public-gw/global/backendServices/k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
sessionAffinity: NONE
timeoutSec: 30

$ gcloud compute health-checks describe --global --project public-gw k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572 
checkIntervalSec: 15
creationTimestamp: '2020-06-29T08:43:42.958-07:00'
description: Default kubernetes L7 Loadbalancing health check for NEG.
healthyThreshold: 1
  portSpecification: USE_SERVING_PORT
  proxyHeader: NONE
id: '327529895999025857'
kind: compute#healthCheck
name: k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
selfLink: https://www.googleapis.com/compute/v1/projects/public-gw/global/healthChecks/k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
timeoutSec: 15
type: HTTP2
unhealthyThreshold: 2
Gatsby-Lee commented 4 years ago

I was wrong. I don't think setting healthcheck through Backendconfig works. Ingress can use custom healthcheck only if Service exists before Ingress knows about the Service.

And, even if custom healthcheck works by bringing up Ingress after Service, the custom healthcheck won't work if new Pod is deployed.

This is what @nicksardo explained before in different msg thread.

BTW, ingress doesn't have to be removed to use custom healthcheck. if Service info is removed from Ingress and added back again, then Ingress uses custom healthcheck again. (But, we can't do this on prod) lol

dpkirchner commented 4 years ago

This feature has not rollout out yet, but will be available in the next release (1.17.6-gke.7) next week, which the exception of port configuration. That will require a bug fix that should roll out a few weeks after. Additionally, this feature won't be available in 1.16 clusters until about a month after it has been released in 1.17.

May I suggest/request adding this to the Feature Comparison table at https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features ? I see custom health checks are marked as being in beta (for Internal Ingress[1]), but there's no version number so it's not clear how to "activate" the beta feature.

[1] I assume this includes ingresses used for IAP given the healthCheck attribute doesn't have an effect in BackendConfigs used there.

christopherdbull commented 2 years ago

Is there any reason why we can't set the Host header with the BackendConfig CRD?

ok-ikaros commented 2 years ago

Hey guys,

Currently I have a service that publishes port 12345 as the service port. I want this to be the port that Ingress routes traffic to because it's the port that listens to websocket messages. I'm making a game by the way, if this adds some color

The Ingress controller looks like this:

      name: other-backend-service
        number: 7350
  - http:
      - backend:
            name: my-server-service
              number: 12345
        path: /my-server
        pathType: ImplementationSpecific

The service listens to websocket connections on port 12345, but I've also spun up a small python server that runs in the same pod experimentally just for the sole purpose of passing the health check. It listens to http requests on / port 80, and return 200 OK. I've gotten it to pass the health check, but only when I publish this port as the service port. When I do this, traffic fails to route to the other port listening for websocket traffic, as they are on the same path

Is there a way for me to configure the health check so that it checks against port 80 without having to publish it as the service port on the same path as the port I want to route traffic to?

I really want to stay in the GCE ingress ecosystem so I would love to get this to work if possible. Thank you so much

swetharepakula commented 2 years ago

@lelandhwu , can you please create a new issue with the same content as https://github.com/kubernetes/ingress-gce/issues/42#issuecomment-1064416269. This seems like a separate issue from Healthcheck configuration.

The remaining ask on this issue is to add Host Header to the BackendConfig. Please open a different issue if this FR is still desired.

Since this issue is specific to custom healthchecks we are closing it out. Custom healthchecks are configurable through the BackendConfig CRD: https://github.com/kubernetes/ingress-gce/issues/42#issuecomment-1064416269.