Closed bowei closed 2 years ago
Hi everyone, please take a look at #1029 which implements the healthchecking overrides on the backendconfig for a service.
Thanks @bowei ! But are GCE-Ingress health check overrides available today ? Not seeing any related docs in https://cloud.google.com/kubernetes-engine/docs/concepts/backendconfig
How to keep track if it is available in a given version of GKE (e.g. stable channel) ?
Can someone post here what the recommended fix is? What does a BackendConfig with a health check look like? What is the min Kubernetes version where this feature is supported?
the docs will be updated very soon -- @spencerhance
An example BackendConfig using this feature:
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: my-backend-config
spec:
healthCheck:
checkIntervalSec: 20
timeoutSec: 1
healthyThreshold: 1
unhealthyThreshold: 3
type: TCP
# defaults to serving port
# port:
# only for HTTP/HTTPS type
# path:
ref: https://godoc.org/k8s.io/ingress-gce/pkg/apis/backendconfig/v1#HealthCheckConfig
It doesn't appear to work though (health check is still created as type HTTP and path "/" regardless of what I configure in the BackendConfig). Is there any GKE release that supports this yet?
I've tried the following setup with no success sadly:
My app (nakama) exposes two ports - 7110 for gRPC (HTTP/2) and 7111 for gRPC-gateway (HTTP 1.1 with /
used for LB healthchecks).
This is running on a GKE instance version 1.16.8-gke.10.
kind: Service
apiVersion: v1
metadata:
name: nakama3
namespace: heroiclabs
labels:
project: heroiclabs
annotations:
cloud.google.com/app-protocols: '{"nakama-api":"HTTP","nakama-grpc-api":"HTTP2"}'
beta.cloud.google.com/backend-config: >-
{"ports":{"nakama-api":"backendconfig","nakama-grpc-api":"backendconfig"}, "default": "backendconfig"}
cloud.google.com/neg: '{"ingress": true}'
spec:
ports:
- name: nakama-grpc-api
protocol: TCP
port: 7110
targetPort: 7110
- name: nakama-api
protocol: TCP
port: 7111
targetPort: 7111
selector:
app: nakama
type: NodePort
backendconfig.yml
kind: BackendConfig
apiVersion: cloud.google.com/v1
metadata:
labels:
project: heroiclabs
name: backendconfig
namespace: heroiclabs
spec:
connectionDraining:
drainingTimeoutSec: 5
logging:
sampleRate: 0.0
timeoutSec: 86400
healthCheck:
port: 7111
checkIntervalSec: 10
ingress.yml
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
name: gundam3
namespace: heroiclabs
labels:
project: heroiclabs
annotations:
ingress.gcp.kubernetes.io/pre-shared-cert: heroiclabs
kubernetes.io/ingress.allow-http: 'false'
spec:
backend:
serviceName: gundam3
servicePort: 7110
The result is the following:
Apologies for tagging you individually guys, but @bowei or @rramkumar1 can you shed some light on what might be going wrong here?
cc: @spencerhance
This feature has not rollout out yet, but will be available in the next release (1.17.6-gke.7) next week, which the exception of port configuration. That will require a bug fix that should roll out a few weeks after. Additionally, this feature won't be available in 1.16 clusters until about a month after it has been released in 1.17.
Thanks @spencerhance and @bowei for the prompt response - I'll keep a look out for that.
...
readinessProbe:
httpGet:
path: /actuator/health
port: 9090
initialDelaySeconds: 5
...
I can manage to have healthy ingress for around 10 minutes by giving custom health check (with node port assigned with readinessProbe port) for automatically created backend service config for load balancer which created with ingress. After about 10 minutes health check returns to look for default node port and then I get 502 again for external ip.
(Using screenshot of @mofirouz, but the same config field I mentioned above.)
Unfortunately they warn about not to update load balancer config manually. But without doing anything manually, I can not reach my service from assigned ip of the load balancer. By the way, I didn't try updating the actuator's health endpoint to /
yet.
Edit
As mentioned here I tried to give exactly same port with the container port for readinessProbe
...
ports:
- containerPort: 8080
name: actuator
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: actuator
...
Then creating Ingress again provides health check config of the load balancer to look for the new path instead /
. It is important to use same port for the readinessProbe with container port to keep healthy ingress. May be incoming healthcheck feature of ingress would notice additional port config.
@ahmetgeymen -- hopefully after the healthcheck feature is available, the need to edit the healthcheck settings manually will go away. Let us know if there is anything that remains that makes custom configuration necessary..
This seems available on beta. I am on 1.17.6-gke.11.
I don't need change in Ingress.
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: config-default
spec:
healthCheck:
checkIntervalSec: 10
timeoutSec: 3
requestPath: /healthz
apiVersion: v1
kind: Service
metadata:
name: test-healthz
annotations:
cloud.google.com/neg: '{"ingress": true}'
beta.cloud.google.com/backend-config: '{"default": "config-default"}'
spec:
ports:
- port: 8080
protocol: TCP
targetPort: 8080
selector:
app: test-healthz
type: NodePort
I'm using v1.17.6-gke.7 but can't get this to work with gRPC. I basically want to use TCP (not HTTP2) health checks because those HTTP2 health checks don't work at all with gRPC.
Here's the resources I have:
$ kubectl get svc gw-test-relaynet-internet-gateway-cogrpc -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
beta.cloud.google.com/backend-config: '{"ports":{"grpc":"cogrpc"}, "default":
"cogrpc"}'
cloud.google.com/app-protocols: '{"grpc":"HTTP2"}'
cloud.google.com/neg: '{"ingress": true}'
cloud.google.com/neg-status: '{"network_endpoint_groups":{"8081":"k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572"},"zones":["europe-west2-a"]}'
meta.helm.sh/release-name: gw-test
meta.helm.sh/release-namespace: default
service.alpha.kubernetes.io/app-protocols: '{"grpc":"HTTP2"}'
creationTimestamp: "2020-06-29T14:21:27Z"
labels:
app.kubernetes.io/instance: gw-test
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
app.kubernetes.io/version: 1.3.9
helm.sh/chart: relaynet-internet-gateway-0.1.0
name: gw-test-relaynet-internet-gateway-cogrpc
namespace: default
resourceVersion: "606508"
selfLink: /api/v1/namespaces/default/services/gw-test-relaynet-internet-gateway-cogrpc
uid: 9e28d795-f1ee-49ab-a307-1424b016d46a
spec:
clusterIP: 10.16.5.230
externalTrafficPolicy: Cluster
ports:
- name: grpc
nodePort: 32112
port: 8081
protocol: TCP
targetPort: grpc
selector:
app.kubernetes.io/instance: gw-test
app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
sessionAffinity: None
type: NodePort
$ kubectl get backendconfig cogrpc -o yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
annotations:
creationTimestamp: "2020-06-30T16:50:36Z"
generation: 2
labels:
project: public-gw
name: cogrpc
namespace: default
resourceVersion: "608964"
selfLink: /apis/cloud.google.com/v1/namespaces/default/backendconfigs/cogrpc
uid: 9387cc5d-6710-4c7e-99ed-dc78f124da5f
spec:
healthCheck:
checkIntervalSec: 20
healthyThreshold: 1
port: 8081
timeoutSec: 1
type: TCP
unhealthyThreshold: 3
The old HTTP2 healthcheck is still used. In fact, I can't see the healthcheck that should've been created by the BackendConfig
above (which, according to kubectl
, is in the right cluster and namespace) -- I can see it in k8s but not GCP (I'm sure the project
label is set to the right value).
Any idea what I'm doing wrong?
@gnarea I don't know much about grpc maybe you can get some idea from here - https://cloud.google.com/compute/docs/reference/rest/v1/healthChecks
Thanks @Gatsby-Lee! I think that link helps with the structure of the data of the .spec
, but I don't think that's the problem here. According to kubectl describe backendconfig cogrpc
, the values I specified seem to have been accepted as valid. The problem is that the LB isn't using my custom health check -- And in fact, I can't find that healthcheck on GCP, but I can still see my BackendConfig
resource in the cluster, which I guess is why the LB can't use it.
I also gave up on the TCP probe because I couldn't get it to work and, even if I eventually did, it'd be far too unreliable for a health check. Instead, per the suggestion of a GCP support agent, I've now created a new container in the pod, which is an HTTP app with a single endpoint that in turn pings my gRPC service using the gRPC health check protocol. This approach is basically Option 3 in https://kubernetes.io/blog/2018/10/01/health-checking-grpc-servers-on-kubernetes/ (except that I'm doing HTTP probes instead of exec
probes to make the GCP LB healthchecks work)
To sum up, I want to configure the gRPC backend in the LB in such a way that the health check points to the HTTP proxy containers but the actual backend uses the gRPC containers. This is the kind of things you'd be able to do with the fix in this issue, right? If so, how can I configure the BackendConfig
, Service
and potentially Ingress
to achieve this?
Here's my current service, backendconfig and deployment in case that's useful:
$ kubectl get svc gw-test-relaynet-internet-gateway-cogrpc -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
cloud.google.com/app-protocols: '{"grpc":"HTTP2"}'
cloud.google.com/neg: '{"ingress": true}'
cloud.google.com/neg-status: '{"network_endpoint_groups":{"8081":"k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572"},"zones":["europe-west2-a"]}'
meta.helm.sh/release-name: gw-test
meta.helm.sh/release-namespace: default
service.alpha.kubernetes.io/app-protocols: '{"grpc":"HTTP2"}'
creationTimestamp: "2020-06-29T14:21:27Z"
labels:
app.kubernetes.io/instance: gw-test
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
app.kubernetes.io/version: 1.3.10
helm.sh/chart: relaynet-internet-gateway-0.1.0
name: gw-test-relaynet-internet-gateway-cogrpc
namespace: default
resourceVersion: "989264"
selfLink: /api/v1/namespaces/default/services/gw-test-relaynet-internet-gateway-cogrpc
uid: 9e28d795-f1ee-49ab-a307-1424b016d46a
spec:
clusterIP: 10.16.5.230
externalTrafficPolicy: Cluster
ports:
- name: grpc
nodePort: 32112
port: 8081
protocol: TCP
targetPort: grpc
- name: health-check
nodePort: 32758
port: 8082
protocol: TCP
targetPort: health-check
selector:
app.kubernetes.io/instance: gw-test
app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
sessionAffinity: None
type: NodePort
status:
loadBalancer: {}
$ kubectl get backendconfig cogrpc -o yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
annotations:
creationTimestamp: "2020-07-01T10:41:21Z"
generation: 1
labels:
project: public-gw
name: cogrpc
namespace: default
resourceVersion: "1026811"
selfLink: /apis/cloud.google.com/v1/namespaces/default/backendconfigs/cogrpc
uid: 57c6219f-91ee-4e45-9576-a817c958dc3c
spec:
healthCheck:
checkIntervalSec: 20
healthyThreshold: 1
port: 8082
timeoutSec: 1
type: HTTP
unhealthyThreshold: 3
$ kubectl get deploy gw-test-relaynet-internet-gateway-cogrpc -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "15"
meta.helm.sh/release-name: gw-test
meta.helm.sh/release-namespace: default
creationTimestamp: "2020-06-29T14:21:27Z"
generation: 15
labels:
app.kubernetes.io/instance: gw-test
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
app.kubernetes.io/version: 1.3.10
helm.sh/chart: relaynet-internet-gateway-0.1.0
name: gw-test-relaynet-internet-gateway-cogrpc
namespace: default
resourceVersion: "1012358"
selfLink: /apis/apps/v1/namespaces/default/deployments/gw-test-relaynet-internet-gateway-cogrpc
uid: f8c67784-e2ed-4f56-89cc-7763f1adf059
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: gw-test
app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: gw-test
app.kubernetes.io/name: relaynet-internet-gateway-cogrpc
spec:
containers:
- command:
- node
- build/main/bin/cogrpc-server.js
env:
- name: COGRPC_ADDRESS
value: https://cogrpc-test.relaycorp.tech
envFrom:
- configMapRef:
name: gw-test-relaynet-internet-gateway
- configMapRef:
name: gw-test-relaynet-internet-gateway-cogrpc
- secretRef:
name: gw-test-relaynet-internet-gateway
image: quay.io/relaycorp/relaynet-internet-gateway:v1.3.10
imagePullPolicy: IfNotPresent
name: cogrpc
ports:
- containerPort: 8080
name: grpc
protocol: TCP
resources: {}
securityContext: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- command:
- /bin/grpc_health_proxy
- -http-listen-addr
- 0.0.0.0:8081
- -grpcaddr
- 127.0.0.1:8080
- -service-name
- CargoRelay
- -v
- "10"
image: salrashid123/grpc_health_proxy:1.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: health-check
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: cogrpc-health-check
ports:
- containerPort: 8081
name: health-check
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: health-check
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: gw-test-relaynet-internet-gateway-cogrpc
serviceAccountName: gw-test-relaynet-internet-gateway-cogrpc
shareProcessNamespace: true
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-07-01T09:48:12Z"
lastUpdateTime: "2020-07-01T09:48:12Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2020-07-01T09:46:21Z"
lastUpdateTime: "2020-07-01T10:04:47Z"
message: ReplicaSet "gw-test-relaynet-internet-gateway-cogrpc-5c4f6d9cf6" has
successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 15
readyReplicas: 1
replicas: 1
updatedReplicas: 1
All pods are running properly as you can see in the deployment status.
And as you'll see below, the healtcheck for this backend is connecting to the gRPC service (port: 32112
) over HTTP2 instead of the HTTP proxy (node port 32578 according to kubectl describe svc gw-test-relaynet-internet-gateway-cogrpc
) over HTTP:
$ gcloud compute backend-services describe --global --project public-gw k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
affinityCookieTtlSec: 0
backends:
- balancingMode: RATE
capacityScaler: 1.0
group: https://www.googleapis.com/compute/v1/projects/public-gw/zones/europe-west2-a/networkEndpointGroups/k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
maxRatePerEndpoint: 1.0
connectionDraining:
drainingTimeoutSec: 0
creationTimestamp: '2020-06-29T08:43:45.264-07:00'
description: '{"kubernetes.io/service-name":"default/gw-test-relaynet-internet-gateway-cogrpc","kubernetes.io/service-port":"grpc","x-features":["HTTP2","NEG"]}'
enableCDN: false
fingerprint: tVJFuu8kHHM=
healthChecks:
- https://www.googleapis.com/compute/v1/projects/public-gw/global/healthChecks/k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
id: '9026318342467072734'
kind: compute#backendService
loadBalancingScheme: EXTERNAL
logConfig:
enable: true
name: k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
port: 32112
portName: port32112
protocol: HTTP2
selfLink: https://www.googleapis.com/compute/v1/projects/public-gw/global/backendServices/k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
sessionAffinity: NONE
timeoutSec: 30
$ gcloud compute health-checks describe --global --project public-gw k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
checkIntervalSec: 15
creationTimestamp: '2020-06-29T08:43:42.958-07:00'
description: Default kubernetes L7 Loadbalancing health check for NEG.
healthyThreshold: 1
http2HealthCheck:
portSpecification: USE_SERVING_PORT
proxyHeader: NONE
id: '327529895999025857'
kind: compute#healthCheck
name: k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
selfLink: https://www.googleapis.com/compute/v1/projects/public-gw/global/healthChecks/k8s1-4ddb902c-defaul-gw-test-relaynet-internet-gate-80-01fab572
timeoutSec: 15
type: HTTP2
unhealthyThreshold: 2
I was wrong. I don't think setting healthcheck through Backendconfig works. Ingress can use custom healthcheck only if Service exists before Ingress knows about the Service.
And, even if custom healthcheck works by bringing up Ingress after Service, the custom healthcheck won't work if new Pod is deployed.
This is what @nicksardo explained before in different msg thread.
BTW, ingress doesn't have to be removed to use custom healthcheck. if Service info is removed from Ingress and added back again, then Ingress uses custom healthcheck again. (But, we can't do this on prod) lol
This feature has not rollout out yet, but will be available in the next release (1.17.6-gke.7) next week, which the exception of port configuration. That will require a bug fix that should roll out a few weeks after. Additionally, this feature won't be available in 1.16 clusters until about a month after it has been released in 1.17.
May I suggest/request adding this to the Feature Comparison table at https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features ? I see custom health checks are marked as being in beta (for Internal Ingress[1]), but there's no version number so it's not clear how to "activate" the beta feature.
[1] I assume this includes ingresses used for IAP given the healthCheck attribute doesn't have an effect in BackendConfigs used there.
Is there any reason why we can't set the Host header with the BackendConfig CRD?
Hey guys,
Currently I have a service that publishes port 12345
as the service port. I want this to be the port that Ingress routes traffic to because it's the port that listens to websocket messages. I'm making a game by the way, if this adds some color
The Ingress controller looks like this:
spec:
defaultBackend:
service:
name: other-backend-service
port:
number: 7350
rules:
- http:
paths:
- backend:
service:
name: my-server-service
port:
number: 12345
path: /my-server
pathType: ImplementationSpecific
The service listens to websocket connections on port 12345, but I've also spun up a small python server that runs in the same pod experimentally just for the sole purpose of passing the health check. It listens to http requests on /
port 80
, and return 200 OK
. I've gotten it to pass the health check, but only when I publish this port as the service port. When I do this, traffic fails to route to the other port listening for websocket traffic, as they are on the same path
Is there a way for me to configure the health check so that it checks against port 80 without having to publish it as the service port on the same path as the port I want to route traffic to?
I really want to stay in the GCE ingress ecosystem so I would love to get this to work if possible. Thank you so much
@lelandhwu , can you please create a new issue with the same content as https://github.com/kubernetes/ingress-gce/issues/42#issuecomment-1064416269. This seems like a separate issue from Healthcheck configuration.
The remaining ask on this issue is to add Host Header to the BackendConfig. Please open a different issue if this FR is still desired.
Since this issue is specific to custom healthchecks we are closing it out. Custom healthchecks are configurable through the BackendConfig CRD: https://github.com/kubernetes/ingress-gce/issues/42#issuecomment-1064416269.
From @freehan on May 15, 2017 21:25
On GCE, ingress controller sets up default healthcheck for backends. The healthcheck will point to the nodeport of backend services on every node. Currently, there is no way to describe detail configuration of healthcheck in ingress. On the other side, each application may want to handle healthcheck differently. To bypass this limitation, on Ingress creation, ingress controller will scan all backend pods and pick the first ReadinessProbe it encounters and configure healthcheck accordingly. However, healthcheck will not be updated if ReadinessProbe was updated. (Refer: https://github.com/kubernetes/ingress/issues/582)
I see 3 options going forward with healthcheck
1) Expand the Ingress or Service spec to include more configuration for healthcheck. It should include the capabilities provided by major cloud providers, GCP, AWS...
2) Keep using readiness probe for healthcheck configuration, a) Keep today's behavior and communicate clearly regarding the expectation. However, this still breaks the abstraction and declarative nature of k8s. b) Let ingress controller watch the backend pods for any updates for ReadinessProbe. This seems expensive and complicated.
3) Only setup default healthcheck for ingresses. Ingress controller will only ensure the healthcheck exist periodically, but do not care about its detail configuration. User can configure it directly thru the cloud provider.
I am in favor of option 3). There are always more bells and whistles on different cloud providers. The higher layer we go, the more features we can utilize. For L7 LB, there is no clean simple way to describe every intention. So is the case for health check. To ensure a smooth experience, k8s still sets up the basics. For advance use cases, user will have to configure it thru the cloud provider.
Thoughts? @kubernetes/sig-network-misc
Copied from original issue: kubernetes/ingress-nginx#720