grpc-ecosystem / grpc-health-probe

A command-line tool to perform health-checks for gRPC applications in Kubernetes and elsewhere
Apache License 2.0
1.44k stars 188 forks source link
grpc grpc-health kubernetes

grpc_health_probe(1)

ci GitHub all releases

The grpc_health_probe utility allows you to query health of gRPC services that expose service their status through the gRPC Health Checking Protocol.

grpc_health_probe is meant to be used for health checking gRPC applications in Kubernetes, using the exec probes.

:warning: Kubernetes has now built-in gRPC health checking capability as generally available. As a result, you might no longer need to use this tool and can use the native Kubernetes feature instead.

This tool can still be useful if you are on older versions of Kubernetes, or using advanced configuration (such as custom metadata, TLS or finer timeout tuning), or not using Kubernetes at all.

This command-line utility makes a RPC to /grpc.health.v1.Health/Check. If it responds with a SERVING status, the grpc_health_probe will exit with success, otherwise it will exit with a non-zero exit code (documented below).

EXAMPLES

$ grpc_health_probe -addr=localhost:5000
healthy: SERVING
$ grpc_health_probe -addr=localhost:9999 -connect-timeout 250ms -rpc-timeout 100ms
failed to connect service at "localhost:9999": context deadline exceeded
exit status 2

Installation

It is recommended to use a version-stamped binary distribution:

Installing from source (not recommended):

Using the gRPC Health Checking Protocol

To make use of the grpc_health_probe, your application must implement the gRPC Health Checking Protocol v1. This means you must to register the Health service and implement the rpc Check that returns a SERVING status.

Since the Health Checking protocol is part of the gRPC core, it has packages/libraries available for the languages supported by gRPC:

[health.proto] [Go] [Java] [Python] [C#/NuGet] [Ruby] ...

Most of the languages listed above provide helper functions that hides implementation details. This eliminates the need for you to implement the Check rpc yourself.

Example: gRPC health checking on Kubernetes

Kubernetes now supports gRPC health checking. If your cluster is running a version that supports gRPC health checking, you can define a gRPC liveness probe in your Pod specification. For more information on how to define a gRPC liveness probe in Kubernetes, see the Kubernetes documentation.

However, if your Kubernetes version does not support gRPC health checking or if you want to use some advanced features that Kubernetes does not support, you can use grpc_health_probe to health-check your gRPC server. As a solution, grpc_health_probe can be used for Kubernetes to health-check gRPC servers running in the Pod.

You are recommended to use Kubernetes exec probes and define liveness and/or readiness checks for your gRPC server pods.

You can bundle the statically compiled grpc_health_probe in your container image. Choose a binary release and download it in your Dockerfile:

RUN GRPC_HEALTH_PROBE_VERSION=v0.4.13 && \
    wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64 && \
    chmod +x /bin/grpc_health_probe

In your Kubernetes Pod specification manifest, specify a livenessProbe and/or readinessProbe for the container:

spec:
  containers:
  - name: server
    image: "[YOUR-DOCKER-IMAGE]"
    ports:
    - containerPort: 5000
    readinessProbe:
      exec:
        command: ["/bin/grpc_health_probe", "-addr=:5000"]
      initialDelaySeconds: 5
    livenessProbe:
      exec:
        command: ["/bin/grpc_health_probe", "-addr=:5000"]
      initialDelaySeconds: 10

This approach provide proper readiness/liveness checking to your applications that implement the gRPC Health Checking Protocol.

Health Checking TLS Servers

If a gRPC server is serving traffic over TLS, or uses TLS client authentication to authorize clients, you can still use grpc_health_probe to check health with command-line options:

Option Description
-tls use TLS (default: false)
-tls-ca-cert path to file containing CA certificates (to override system root CAs)
-tls-client-cert client certificate for authenticating to the server
-tls-client-key private key for for authenticating to the server
-tls-no-verify use TLS, but do not verify the certificate presented by the server (INSECURE) (default: false)
-tls-server-name override the hostname used to verify the server certificate

Health checking TLS Servers with SPIFFE issued credentials

If your gRPC server requires authentication, you can use the following command line options and set the SPIFFE_ENDPOINT_SOCKET environment variable.

Option Description
-spiffe use SPIFFE Workload API to retrieve TLS credentials (default: false)

Other Available Flags

Option Description
-v verbose logs (default: false)
-connect-timeout timeout for establishing connection
-rpc-timeout timeout for health check rpc
-rpc-header sends metadata in the RPC request context (default: empty map)
-user-agent user-agent header value of health check requests (default: grpc_health_probe)
-service service name to check (default: "") - empty string is convention for server health
-gzip use GZIPCompressor for requests and GZIPDecompressor for response (default: false)
-version print the probe version and exit

Example:

  1. Start the route_guide example server with TLS by running:

    go run server/server.go -tls

  2. Run grpc_client_probe with the CA certificate (in the testdata/ directory) and hostname override the cert is signed for:

      $ grpc_health_probe -addr 127.0.0.1:10000 \
          -tls \
          -tls-ca-cert /path/to/testdata/ca.pem \
          -tls-server-name=example.com \
          -rpc-header=foo:bar \
          -rpc-header=foo2:bar2
    
      status: SERVING

Exit codes

It is not recommended to rely on specific exit statuses. Any failure will be a non-zero exit code.

Exit Code Description
0 success: rpc response is SERVING.
1 failure: invalid command-line arguments
2 failure: connection failed or timed out
3 failure: rpc failed or timed out
4 failure: rpc successful, but the response is not SERVING
20 failure: could not retrieve TLS credentials using the SPIFFE Workload API

This is not an official Google project.