kubernetes / ingress-gce

Ingress controller for Google Cloud
Apache License 2.0
1.27k stars 298 forks source link

Ingress Healthcheck Configuration #42

Closed bowei closed 2 years ago

bowei commented 6 years ago

From @freehan on May 15, 2017 21:25

On GCE, ingress controller sets up default healthcheck for backends. The healthcheck will point to the nodeport of backend services on every node. Currently, there is no way to describe detail configuration of healthcheck in ingress. On the other side, each application may want to handle healthcheck differently. To bypass this limitation, on Ingress creation, ingress controller will scan all backend pods and pick the first ReadinessProbe it encounters and configure healthcheck accordingly. However, healthcheck will not be updated if ReadinessProbe was updated. (Refer: https://github.com/kubernetes/ingress/issues/582)

I see 3 options going forward with healthcheck

1) Expand the Ingress or Service spec to include more configuration for healthcheck. It should include the capabilities provided by major cloud providers, GCP, AWS...

2) Keep using readiness probe for healthcheck configuration, a) Keep today's behavior and communicate clearly regarding the expectation. However, this still breaks the abstraction and declarative nature of k8s. b) Let ingress controller watch the backend pods for any updates for ReadinessProbe. This seems expensive and complicated.

3) Only setup default healthcheck for ingresses. Ingress controller will only ensure the healthcheck exist periodically, but do not care about its detail configuration. User can configure it directly thru the cloud provider.

I am in favor of option 3). There are always more bells and whistles on different cloud providers. The higher layer we go, the more features we can utilize. For L7 LB, there is no clean simple way to describe every intention. So is the case for health check. To ensure a smooth experience, k8s still sets up the basics. For advance use cases, user will have to configure it thru the cloud provider.

Thoughts? @kubernetes/sig-network-misc

Copied from original issue: kubernetes/ingress-nginx#720

bowei commented 6 years ago

From @k8s-ci-robot on May 15, 2017 21:25

@freehan: These labels do not exist in this repository: sig/network.

In response to [this](https://github.com/kubernetes/ingress/issues/720): >On GCE, ingress controller sets up default healthcheck for backends. The healthcheck will point to the nodeport of backend services on every node. Currently, there is no way to describe detail configuration of healthcheck in ingress. On the other side, each application may want to handle healthcheck differently. To bypass this limitation, on Ingress creation, ingress controller will scan all backend pods and pick the first ReadinessProbe it encounters and configure healthcheck accordingly. However, healthcheck will not be updated if ReadinessProbe was updated. (Refer: https://github.com/kubernetes/ingress/issues/582) > >I see 3 options going forward with healthcheck > >1) Expand the Ingress or Service spec to include more configuration for healthcheck. It should include the capabilities provided by major cloud providers, GCP, AWS... > >2) Keep using readiness probe for healthcheck configuration, > a) Keep today's behavior and communicate clearly regarding the expectation. However, this still breaks the abstraction and declarative nature of k8s. > b) Let ingress controller watch the backend pods for any updates for ReadinessProbe. This seems expensive and complicated. > >3) Only setup default healthcheck for ingresses. Ingress controller will only ensure the healthcheck exist periodically, but do not care about its detail configuration. User can configure it directly thru the cloud provider. > > >I am in favor of option 3). There are always more bells and whistles on different cloud providers. The higher layer we go, the more features we can utilize. For L7 LB, there is no clean simple way to describe every intention. So is the case for health check. To ensure a smooth experience, k8s still sets up the basics. For advance use cases, user will have to configure it thru the cloud provider. > >Thoughts? @kubernetes/sig-network-misc Instructions for interacting with me using PR comments are available [here](https://github.com/kubernetes/community/blob/master/contributors/devel/pull-request-commands.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://github.com/kubernetes/test-infra/blob/master/commands.md).
bowei commented 6 years ago

From @tonglil on August 14, 2017 20:8

For option 3, is the "default healthcheck" hitting the "default-backend"?

edevil commented 6 years ago

I'm in favor of option 1.

I'm sure there is a minimum subset of features common to all cloud providers that make sense to be included in the ingress spec, and it would improve the current situation a lot.

tobsch commented 6 years ago

+1

tonglil commented 6 years ago

For those in favor of option 1, please read this conversation: https://github.com/kubernetes/ingress-gce/issues/28.

hdave commented 6 years ago

Reading that conversation -- I think configuring healthcheck via annotations would be great.

jeremywadsack commented 6 years ago

@bowei I think that kubernetes/contrib#325 is related to this?

epsniff commented 6 years ago

I ran into this as well and found this post: https://github.com/kubernetes/kubernetes/issues/20555#issuecomment-326058311

hdave commented 6 years ago

I would be in favor of option #1 if the ingress controller could configure an HTTPS health check to a back end services that uses a cert. If not, I would go with option #3 and just let our devops team manually tweak the health check without the ingress controller caring and resetting it back.

matti commented 6 years ago

"on Ingress creation, ingress controller will scan all backend pods and pick the first ReadinessProbe it encounters and configure healthcheck accordingly"

I'm not seeing this, the health check will always point to default path "/" with:

        readinessProbe:
          httpGet:
            path: /health
            port: 8080
Gogoro commented 6 years ago

I got the same issue as @matti. When I create an ingress pointing to a service, which again points to a pod it just keeps hitting / instead of the path I defined in readinessProbe and livenessProbe. I can see in the logs that the pod itself checks itself easily, but the healthcheck goes ham on /. :(

I've been trying for a while to find information on this topic, but I feel like it's not very well explained and documented. If anyone finds a solution for this it would be much appreciated!

matti commented 6 years ago

@Gogoro thanks, opened a new issue because this issue is for semantic discussion

nicksardo commented 6 years ago

Healthcheck configuration should be provided via BackendConfig CRD and the readiness probe approach should be deprecated and eventually removed.

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

justinsb commented 5 years ago

/remove-lifecycle stale

rramkumar1 commented 5 years ago

/help-wanted /good-first-issue

k8s-ci-robot commented 5 years ago

@rramkumar1: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to [this](https://github.com/kubernetes/ingress-gce/issues/42): >/help-wanted >/good-first-issue >/kind feature Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
bowei commented 5 years ago

/lifecycle frozen

nieldw commented 5 years ago

A motivation for why option 1 (via BackendConfig CRD as @nicksardo said) is desirable is that sometimes the rules for readiness are different for serving public traffic (via an ingress) and internal traffic from other microservices. As an example, I have a service that needs to accept incoming UDP traffic before it is ready to serve public traffic. This means the readinessProbe on the pod must be OK but the backend for the ingress must still reflect "Unhealthy".

Joseph-Irving commented 5 years ago

Has there been any movement on this issue? The backend config approach seems like a reasonable solution to me.

We're currently trying to use the readiness probe approach, however I noticed it sets the health check path for all backend services to be the readiness path including the default-http-backend, this results in the default-http-backend being marked as unhealthy as it responds with a 404 on the health check path we're using. Does anyone know if this is expected behaviour?

DanielJoyce commented 5 years ago

Bitten by this as well. The readiness probe scanning seems to have too many caveats to even be useful. We just manually patch the healthcheck with the CLI. :P

cemo commented 5 years ago

Another victim is here :( How this supposed to be configured correctly? I tried readinessProbe but still not working. Any workaround?

rmtmckenzie commented 5 years ago

@cemo Just to confirm, have you put something like this in your deployment spec:

    spec:
      containers:
        ports:
        - containerPort: 8080
        ...
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 6
          timeoutSeconds: 1

I've done that and my health checks seem to be set up for /healthz. It might be worth recreating the ingress if you're able to do that.

julianvmodesto commented 5 years ago

It seems that configuring the health check configuration in the Backend Config CRD is the path forward.

Is there a way to watch progress for that change to Backend Config?

Does that mean that the 3rd option below is the compatible way forward to unblock users? Is this a change that would be acceptable if it was made today?

3. Only setup default healthcheck for ingresses. Ingress controller will only ensure the healthcheck exist periodically, but do not care about its detail configuration. User can configure it directly thru the cloud provider.

rramkumar1 commented 5 years ago

@julianvmodesto Will ping here once work for this starts on our end.

cemo commented 5 years ago

@rmtmckenzie thanks. I have tried with simple example it worked. However I have used something more complicated and hit one of the limitations at here:

The pod's containerPort field must be defined
The service's targetPort field must point to the pod port's containerPort value or name. Note that the targetPort defaults to the port value if not defined
The pods must exist at the time of ingress creation
The readiness probe must be exposed on the port matching the servicePort specified in the Ingress
The readiness probe cannot have special requirements like headers
The probe timeouts are translated to GCE health check timeouts
linydquantil commented 5 years ago

+1

jgirdner commented 5 years ago

Does anyone have a solution here? My health check is checking over HTTP and my NodePort is TCP. The health check keeps failing.

jgirdner commented 5 years ago

Ok found my work around. We use Django for our API and the ALLOWED_HOSTS setting was causing the API to throw an error causing the API to fail the health check. We built a custom middleware to allow the CDIR IP range of our cluster.

PassKit commented 5 years ago

Giving this issue a nudge with regard to https://github.com/kubernetes/ingress-gce/issues/553 and the lack of support for mTLS on the load balancer. The not being able to override or remove the default load balancer health check makes it impossible to run a gRPC service with mTLS on GKE.

mofirouz commented 5 years ago

Yes please - this is a real pain for gRPC users. Pretty much the last remaining item to enable us to use L7 LB with gRPC backend.

kvudata commented 5 years ago

To add another use case/scenario where more advanced health check configuration would be helpful:

I'm setting up an HTTPS application on Istio to be exposed via IAP. Istio's gateway (a proxy which routes external traffic to applications within the Istio service mesh) by default uses a TCP load balancer but IAP requires an HTTP(s) load balancer. Thus I expose the gateway as a NodePort service which is exposing my HTTPS application on an HTTPS port, and an Ingress (using ingress-gce) pointing to the Istio gateway's HTTPS service port.

The GCE Ingress currently configures its health check to hit / (and defaults host header to the IP address) on the Istio gateway's HTTPS service port, but this will fail because Istio routes by the host name and a random IP address has no route. I've managed to fix this by manually configuring the health check to specify the hostname of my application via the "Host HTTP header" option on a health check (thus the health check will be routed by Istio to my application and a successful status code can be returned). Ideally one would instead be able to:

dmarkwat commented 5 years ago

@rramkumar1 what is the change of plan you referred to in https://github.com/kubernetes/ingress-gce/pull/681? I'm very curious because what you wrote there would solve this problem--at least from my experience with it recently.

tuananhnguyen-ct commented 4 years ago

To add another use case/scenario where more advanced health check configuration would be helpful:

I'm setting up an HTTPS application on Istio to be exposed via IAP. Istio's gateway (a proxy which routes external traffic to applications within the Istio service mesh) by default uses a TCP load balancer but IAP requires an HTTP(s) load balancer. Thus I expose the gateway as a NodePort service which is exposing my HTTPS application on an HTTPS port, and an Ingress (using ingress-gce) pointing to the Istio gateway's HTTPS service port.

The GCE Ingress currently configures its health check to hit / (and defaults host header to the IP address) on the Istio gateway's HTTPS service port, but this will fail because Istio routes by the host name and a random IP address has no route. I've managed to fix this by manually configuring the health check to specify the hostname of my application via the "Host HTTP header" option on a health check (thus the health check will be routed by Istio to my application and a successful status code can be returned). Ideally one would instead be able to:

* customize the health check to reuse the Istio gateway's readiness probe (which uses a separate port) OR

* customize the health check in k8s (via the BackendConfig?) to specify the host header

We used a rewrite for default host on Istio (anything without a valid host hitting /) to Istio healthcheck endpoint (/healthz/ready), so the fix in on Istio instead of the healthcheck.

hmeerlo commented 4 years ago

This issue is now open for more than 2 years and still we can not customise the health checks. For me it makes it impossible to use certain k8s deployments on GKE. Is there any progress at all on this issue?

Just for reference, my problem: I have a GKE cluster in which I deploy a stable/wordpress helm chart. Certificate is managed by cert-manager and GCLB is created and which terminates the HTTPS (so far so good). But the health checks keep failing because the wordpress backend (rightfully) requires all requests to be over https so it returns a 301 instead of the much needed 200. I tried doing health check over https but it refuses to do that. I tried adding a 'x-forwarded-proto: https' header but it refuses that as well. We need a way to have more control over this!

bowei commented 4 years ago

We will pick this up into backendconfig, likely in the next release.

lfaoro commented 4 years ago

@bowei

On the other side, each application may want to handle healthcheck differently. To bypass this limitation, on Ingress creation, ingress controller will scan all backend pods and pick the first ReadinessProbe it encounters and configure healthcheck accordingly.

This means that if my pod has the following config:

          readinessProbe:
            httpGet:
              path: /v1/card/health
              port: http
              httpHeaders:
                  - name: Authorization
                    value: "Bearer health-check"

the LB health-check should successfully authenticate and mark it HEALTHY.

Instead I see the LB configured the HC on Path / expecting a 200 from there.

Removing the httpHeaders bit, the LB successfully creates the correct HC on path /v1/card/health.

rib commented 4 years ago

Is there any kind of workaround for this issue to allow exposing a grpc service via Ingress?

This issue is getting pretty old so there's no indication it's going to get addressed any time soon, but does that really mean no one is successully able to host a GRPC service on GKE via Ingress? Surely not?

Is the only workaround perhaps to configure a load balancer manually without creating a GKE Ingress resource?

Initially I tried making my server host a http://:8080/readiness separate from my grpc service on port 8443 but found Ingress couldn't be coerced into using that.

Then I tried configuring a readinessProbe using tcpSocket to at least avoid the separate port, before realising the Ingress controller also doesn't understand that.

It seems that there's no way to configure a readinessProbe to use http/2 (in particular without TLS, since TLS will be terminated by Ingress) so now I'm somewhat at a loss as to what to do next, until this issue is resolved - but this issue is over two years old now :(

Any pointer for a workaround would be hugely appreciated!

rib commented 4 years ago

Ok, so after a some more head scratching I did actually get my grpc service working with Ingress.

There were several issues compounding for me since I hadn't fully appreciated that the https load balancer (and health checks) created by the Ingress will only connect to a http/2 backend service with TLS. I had been disabling TLS since I assumed it would be terminated by the load balancer and I didn't understand why my server was always marked as unhealthy. I think I was half expecting that there would be some place to configure certificates if this were the case but it turns out they connect fine with unverified, self-signed certificates - using an IP subjectAltName.

Luckily the grpc server implementation I'm using (tonic) does actually return a status 200 for any uri that doesn't match a grpc request. Things would get a bit more complicated if I needed to change the health check path, since tonic doesn't allow http 1 connections so the trick of configuring a httpGet readinessProbe (currently I use a tcpSocket probe) would require modifying tonic.

bowei commented 4 years ago

/assign

ghost commented 4 years ago

Is there any ETA on this issue?

Also, do I understand it correctly, health checks for ingress-gce (global load balancer) are configured automatically right now? Because if I change those configs after deploying, my changes are reverted.

And with this feature, I would be able to configure path/port/host from within my GKE deployment?

jerome-arzel commented 4 years ago

Hello,

You Can close this issue, it is now resolved. Thanks

Regards

Le ven. 31 janv. 2020 à 10:22, Alexander Matsko notifications@github.com a écrit :

Is there any ETA on this issue?

Also, do I understand it correctly, health checks for ingress-gce (global load balancer) are configured automatically right now? Because if I change those configs after deploying, my changes are reverted.

And with this feature, I would be able to configure path/port/host from within my GKE deployment?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/ingress-gce/issues/42?email_source=notifications&email_token=AA7HAJ36ONXLK7DZZAXTR2DRAPUW7A5CNFSM4D6XQDU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKOBHCQ#issuecomment-580653962, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7HAJ2FWZVO5UW6SL3IWETRAPUW7ANCNFSM4D6XQDUQ .

mofirouz commented 4 years ago

@jerome-arzel Can you point me to documentation and/or changelog that mentions how to configure healthcheck if this is resolved? I don't see anything in v1.8.0.

rramkumar1 commented 4 years ago

@mofirouz This is not resolved yet. I'm not sure what the previous comment was referring to.

ghost commented 4 years ago

So... Any ETAs?

matti commented 4 years ago

peronally I stopped believing in gce ingress and just created LoadBalancer+ingress-nginx. the only downsides: a) can not use global ips b) have to run ingress-nginx as well.

rramkumar1 commented 4 years ago

@omatsko Unfortunately this work got sidetracked in favor of other work but we have picked it back up and are actively working on it. See https://github.com/kubernetes/ingress-gce/pull/1010

soichisumi commented 4 years ago

Can HealthCheckConfig (added in this PR) configure L7LB to pass health checks for grpc applications?

bowei commented 4 years ago

Hi everyone, please take a look at https://github.com/kubernetes/ingress-gce/pull/1029 which implements the healthchecking overrides on the backendconfig for a service.

mofirouz commented 4 years ago

@bowei thanks a lot for your PR. I think this should work well with gRPC applications since we can use a custom path for Healthchecks.

Can you tell us what versions of GKE will this addition be available to use with?