prometheus / blackbox_exporter

Blackbox prober exporter
https://prometheus.io
Apache License 2.0
4.62k stars 1.05k forks source link

GRPC probe with "error reading server preface: EOF" #1260

Open ecomp-fabioleal opened 3 months ago

ecomp-fabioleal commented 3 months ago

Host operating system: output of uname -a

$ uname -a
Linux worker-N 5.15.0-87-generic #97-Ubuntu SMP Mon Oct 2 21:09:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
❯ kubectl get nodes
NAME           STATUS   ROLES    AGE    VERSION
worker-N   Ready    <none>   678d   v1.25.4+k0s

blackbox_exporter version: output of blackbox_exporter --version

~ $ blackbox_exporter --version
blackbox_exporter, version 0.24.0 (branch: HEAD, revision: 0b0467473916fd9e8526e2635c2a0b1c56011dff)
  build user:       root@e5bbfcc8184e
  build date:       20230516-11:07:25
  go version:       go1.20.4
  platform:         linux/amd64
  tags:             netgo

What is the blackbox.yml module config.

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: []
      method: GET
      follow_redirects: true
      preferred_ip_protocol: ip4
      ip_protocol_fallback: false
      enable_http2: false
      tls_config:
        insecure_skip_verify: true          
  http_rest:
    prober: http
    timeout: 5s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: []
      method: GET
      follow_redirects: true
      preferred_ip_protocol: ip4
      ip_protocol_fallback: false
      fail_if_body_not_matches_regexp:
        - "Healthy"
      tls_config:
        insecure_skip_verify: true
  grpc:
    prober: grpc
    timeout: 5s
    grpc:
      service: grpc.health.v1.Health.Check
      preferred_ip_protocol: ip4
      ip_protocol_fallback: false
      tls: false  
      tls_config:
        insecure_skip_verify: true

What is the prometheus.yml scrape config.

...
...
# Probe para blackbox-probe-grpc em DEV      
- job_name: 'blackbox-probe-grpc-dev'
  metrics_path: /probe
  params:
    module: [grpc]
  kubernetes_sd_configs:
    - role: ingress
  relabel_configs:
    - source_labels: [__address__]
      regex: (.+)
      replacement: ${1}:443
      target_label: __param_target
    - source_labels: [__param_target]
      regex: grpc.*
      action: keep
    - target_label: __address__
      replacement: blackbox-exporter-prometheus-blackbox-exporter:9115
    - source_labels: [__param_target]
      target_label: instance
    - action: labelmap
      regex: __meta_kubernetes_ingress_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_ingress_name]
      target_label: ingress_name

What logging output did you get from adding &debug=true to the probe URL?

Logs for the probe:
ts=2024-06-24T12:52:50.570122495Z caller=main.go:181 module=grpc target=grpc.mydomain.com:443 level=info msg="Beginning probe" probe=grpc timeout_seconds=5
ts=2024-06-24T12:52:50.570437479Z caller=grpc.go:150 module=grpc target=grpc.mydomain.com:443 level=info msg="Resolving target address" target=grpc.mydomain.com ip_protocol=ip4
ts=2024-06-24T12:52:50.580058854Z caller=grpc.go:150 module=grpc target=grpc.mydomain.com:443 level=info msg="Resolved target address" target=grpc.mydomain.com ip=EXTERNAL.IP.ADDRESS.OFAPP
ts=2024-06-24T12:52:50.580178175Z caller=handler.go:120 module=grpc target=grpc.mydomain.com:443 level=debug msg="Dialing GRPC without TLS"
ts=2024-06-24T12:52:50.592386251Z caller=handler.go:120 module=grpc target=grpc.mydomain.com:443 level=error msg="can't connect grpc server:" err="rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: EOF\""
ts=2024-06-24T12:52:50.592548007Z caller=main.go:181 module=grpc target=grpc.mydomain.com:443 level=error msg="Probe failed" duration_seconds=0.022299861

Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.009659815
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.022299861
# HELP probe_grpc_duration_seconds Duration of gRPC request by phase
# TYPE probe_grpc_duration_seconds gauge
probe_grpc_duration_seconds{phase="check"} 0.012197848
probe_grpc_duration_seconds{phase="resolve"} 0.009659815
# HELP probe_grpc_healthcheck_response Response HealthCheck response
# TYPE probe_grpc_healthcheck_response gauge
probe_grpc_healthcheck_response{serving_status="NOT_SERVING"} 0
probe_grpc_healthcheck_response{serving_status="SERVICE_UNKNOWN"} 0
probe_grpc_healthcheck_response{serving_status="SERVING"} 0
probe_grpc_healthcheck_response{serving_status="UNKNOWN"} 0
# HELP probe_grpc_ssl Indicates if SSL was used for the connection
# TYPE probe_grpc_ssl gauge
probe_grpc_ssl 0
# HELP probe_grpc_status_code Response gRPC status code
# TYPE probe_grpc_status_code gauge
probe_grpc_status_code 14
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 3.325595068e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_ssl_earliest_cert_expiry Returns last SSL chain expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 0
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0

Module configuration:
prober: grpc
timeout: 5s
http:
  ip_protocol_fallback: true
  follow_redirects: true
  enable_http2: true
tcp:
  ip_protocol_fallback: true
icmp:
  ip_protocol_fallback: true
  ttl: 64
dns:
  ip_protocol_fallback: true
  recursion_desired: true
grpc:
  service: grpc.health.v1.Health.Check
  tls_config:
    insecure_skip_verify: true
  preferred_ip_protocol: ip4

What did you do that produced an error?

Any configuration used to capture GRPC results give me the same responses

What did you expect to see?

Application as Serving Status

What did you see instead?

The output from what I get from adding &debug=true to the probe URL

Hi Guys! I'm sorry if this is not the right place to talk about blackbox-exporter. If is the case, please let me known the correct channel. I'm having some problems trying to get the grpc prober working correctly. I do have some endpoints working as grpc I known that they are working because the kubernetes probes Readiness and Startup. When I call the endpoint using grpCURL (or grpcUI) gt the follow results:

# grpcurl -vv -insecure grpc.mydomain.com:443 grpc.health.v1.Health.Check

Resolved method descriptor:
rpc Check ( .grpc.health.v1.HealthCheckRequest ) returns ( .grpc.health.v1.HealthCheckResponse );

Request metadata to send:
(empty)

Response headers received:
content-type: application/grpc
date: Fri, 21 Jun 2024 15:24:04 GMT

Estimated response size: 2 bytes

Response contents:
{
  "status": "SERVING"
}

Response trailers received:
(empty)
Sent 0 requests and received 1 response
Timing Data: 69.670173ms
  Dial: 23.518394ms
    TLS Setup: 3.44µs
    BlockingDial: 23.488795ms
  InvokeRPC: 40.137506ms

I already made several changes in blackbox.yaml and the scrape job in prometheus.yaml, but this error message is haunting me (lol) ts=2024-06-21T15:09:01.255658137Z caller=handler.go:120 module=grpc target=grpc.mydomain.com:443 level=error msg="can't connect grpc server:" err="rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: EOF\""

Can anyone help, or at least point me to a way to correctly use this probe?

ecomp-fabioleal commented 3 months ago

Anyone? At least tell me if I'm on the right track using this probe