prometheus / blackbox_exporter

Blackbox prober exporter
https://prometheus.io
Apache License 2.0
4.71k stars 1.06k forks source link

Unable to ping with ICMP prober over IPv6 but IPv4 works #1023

Open smbambling opened 1 year ago

smbambling commented 1 year ago

Host operating system: output of uname -a

CentOS Linux release 7.9.2009 (Core) Linux prom1.example 3.10.0-1160.66.1.el7.x86_64 #1 SMP Wed May 18 16:02:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

blackbox_exporter version: output of blackbox_exporter --version

Running on K3S via the kube-prometheus-stack helm chart

blackbox_exporter, version 0.23.0 (branch: HEAD, revision: 26fc98b9c6db21457653ed752f34d1b7fb5bba43) build user: root@f360719453e3 build date: 20221202-12:26:32 go version: go1.19.3 platform: linux/amd64

What did you do that produced an error?

Logs for the probe: ts=2023-02-06T11:25:27.829923503Z caller=main.go:181 module=ping_v6 target=2001:500:110:affe::249 level=info msg="Beginning probe" probe=icmp timeout_seconds=5 ts=2023-02-06T11:25:27.83021627Z caller=icmp.go:91 module=ping_v6 target=2001:500:110:affe::249 level=info msg="Resolving target address" target=2001:500:110:affe::249 ip_protocol=ip6 ts=2023-02-06T11:25:27.830297445Z caller=icmp.go:91 module=ping_v6 target=2001:500:110:affe::249 level=info msg="Resolved target address" target=2001:500:110:affe::249 ip=2001:500:110:affe::249 ts=2023-02-06T11:25:27.830334595Z caller=handler.go:117 module=ping_v6 target=2001:500:110:affe::249 level=info msg="Creating socket" ts=2023-02-06T11:25:27.841335002Z caller=handler.go:117 module=ping_v6 target=2001:500:110:affe::249 level=debug msg="Unable to do unprivileged listen on socket, will attempt privileged" err="socket: protocol not supported" ts=2023-02-06T11:25:27.84154106Z caller=handler.go:117 module=ping_v6 target=2001:500:110:affe::249 level=error msg="Error listening to socket" err="listen ip6:ipv6-icmp ::: socket: operation not permitted" ts=2023-02-06T11:25:27.841592441Z caller=main.go:181 module=ping_v6 target=2001:500:110:affe::249 level=error msg="Probe failed" duration_seconds=0.01156632

Following the documentation at https://github.com/prometheus/blackbox_exporter#permissions I've set various combinations of capabilities (NET_ADMIN, NET_RAW) and the sysctl net.ipv4.ping_group_range. I still get a failure with IPv6 when using both

podSecurityContext:
  fsGroup: 1000
  sysctls:
    - name: net.ipv4.ping_group_range
      value: "0 2147483647"
securityContext:
  capabilities:
    add:
      - NET_RAW

** Only setting the cap_net_raw failed to grant the correct permissions for any ICMP requests. However after setting that value IPv4 ICMP request were correctly working.

It appears that net.ipv4.ping_group_range should apply to both IPv4 and IPv6 https://bugzilla.redhat.com/show_bug.cgi?id=1315335#c2

smbambling commented 1 year ago

Updating the container to run as root DOES allow the blackbox exporter to bind to the IPv6 socker, and I can use the ping utility as well to verify

My values for testing

pspEnabled: false

podSecurityContext:
  fsGroup: 1000
  sysctls:
    - name: net.ipv4.ping_group_range
      value: "0 2147483647"

securityContext:
  runAsUser:
  runAsGroup:
  runAsNonRoot: false
  allowPrivilegeEscalation: true
  readOnlyRootFilesystem: false
  capabilities:
    add:
      - NET_ADMIN
      - NET_RAW
    drop: []

Ping output

/tmp # ifconfig
eth0      Link encap:Ethernet  HWaddr 7E:6A:A4:51:17:C4
          inet addr:10.42.0.144  Bcast:10.42.0.255  Mask:255.255.255.0
          inet6 addr: fe80::7c6a:a4ff:fe51:17c4/64 Scope:Link
          inet6 addr: fc15:1::186/64 Scope:Global
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:1929 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1815 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1188146 (1.1 MiB)  TX bytes:709224 (692.6 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

/tmp # ping6 2001:500:4:201::47
PING 2001:500:4:201::47 (2001:500:4:201::47): 56 data bytes
64 bytes from 2001:500:4:201::47: seq=0 ttl=55 time=3.515 ms
64 bytes from 2001:500:4:201::47: seq=1 ttl=55 time=3.345 ms
smbambling commented 1 year ago

Even running the container as root the cap_net_raw capability is required in order to allow IPv6 to bind to the socket for ICMP ping request.

dswarbrick commented 1 year ago

Running blackbox_exporter as root shouldn't require explicitly granting it cap_net_raw, since user id 0 is permitted to use raw sockets anyway. To keep the permissions more granular however, running non-root but with cap_net_raw is sufficient to ping both IPv4 and IPv6 targets.

Configuring net.ipv4.ping_group_range allows members of the specified groups to send ICMP / ICMPv6 echo packets, without needing root or cap_net_raw. It is even finer-grained than being able to send arbitrary raw IP packets, since it only permits IPPROTO_ICMP / IPPROTO_ICMPV6, as opposed to IPPROTO_RAW.

blackbox_exporter attempts to use unprivileged ping sockets on darwin and linux (as would be permitted by net.ipv4.ping_group_range), and falls back to traditional privileged pings requiring user id 0 or cap_net_raw.

The kernel commit which expanded the scope of net.ipv4.ping_group_range to also allow IPv6 pings is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net?id=6d0bfe22611602f36617bc7aa2ffa1bbb2f54c67, which was first included in kernel version 3.11. You would need to research whether this was backported to Centos' 3.10 kernel. If you are finding that you still need to specify cap_net_raw to get IPv6 pings working, it sounds like the Centos 3.10 kernel has not backported this commit.

samip5 commented 1 year ago

I'm also facing this issue, and I'm running kernel 6.1.0 on Debian 12.

For reference, ping6 on the container will also fail with permission denied with NET_ADMIN and NET_RAW caps:

~ $ ping6 google.com
PING google.com (2a00:1450:4026:804::200e): 56 data bytes
ping6: permission denied (are you root?)

Current values are:

fullnameOverride: blackbox-exporter

image:
  registry: quay.io

podSecurityContext:
  sysctls:
    - name: net.ipv4.ping_group_range
      value: "0 2147483647"

config:
  modules:
    http_2xx:
      prober: http
      timeout: 5s
      http:
        valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
        follow_redirects: true
        preferred_ip_protocol: "ip4"
    icmp4:
      prober: icmp
      timeout: 30s
      icmp:
        preferred_ip_protocol: "ip4"
    icmp6:
      prober: icmp
      timeout: 30s
      icmp:
        preferred_ip_protocol: "ip6"

prometheusRule:
  enabled: true
  additionalLabels:
    app: prometheus-operator
    release: prometheus
  rules:
    - alert: BlackboxSslCertificateWillExpireSoon
      expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 3
      for: 15m
      labels:
        severity: critical
      annotations:
        description: |-
          The SSL certificate for {{"{{ $labels.target }}"}} will expire in less than 3 days
    - alert: BlackboxSslCertificateExpired
      expr: probe_ssl_earliest_cert_expiry - time() <= 0
      for: 15m
      labels:
        severity: critical
      annotations:
        description: |-
          The SSL certificate for {{"{{ $labels.target }}"}} has expired
    - alert: BlackboxProbeFailed
      expr: probe_success == 0
      for: 15m
      labels:
        severity: critical
      annotations:
        description: |-
          The host {{"{{ $labels.target }}"}} is currently unreachable

pspEnabled: false

securityContext:
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    add:
      - NET_ADMIN
      - NET_RAW

serviceMonitor:
  enabled: true
  defaults:
    labels:
      release: prometheus
    interval: 1m
    scrapeTimeout: 30s
  targets:
    # Other devices
    - module: icmp4
      name: zigbee-controller-icmp
      url: 192.168.2.112

    - module: icmp4
      name: nas-icmp
      url: 192.168.2.2

    - module: icmp4
      name: ping-cloudflare
      url: 1.1.1.1
      scrape_interval: 30s
    - module: icmp6
      name: ping6-aroot-fi
      url: a.fi
      scrape_interval: 30s
dswarbrick commented 1 year ago

@samip5 It would be helpful if you could include the output of a probe with &debug=true.

samip5 commented 1 year ago

@samip5 It would be helpful if you could include the output of a probe with &debug=true.

Logs for the probe:
ts=2023-07-26T20:15:19.153166886Z caller=main.go:181 module=icmp6 target=a.fi level=info msg="Beginning probe" probe=icmp timeout_seconds=30
ts=2023-07-26T20:15:19.203719654Z caller=icmp.go:91 module=icmp6 target=a.fi level=info msg="Resolving target address" target=a.fi ip_protocol=ip6
ts=2023-07-26T20:15:20.765398623Z caller=icmp.go:91 module=icmp6 target=a.fi level=info msg="Resolved target address" target=a.fi ip=2001:708:10:53::53
ts=2023-07-26T20:15:20.765476243Z caller=handler.go:120 module=icmp6 target=a.fi level=info msg="Creating socket"
ts=2023-07-26T20:15:20.777213805Z caller=handler.go:120 module=icmp6 target=a.fi level=info msg="Creating ICMP packet" seq=58042 id=50125
ts=2023-07-26T20:15:20.810597242Z caller=handler.go:120 module=icmp6 target=a.fi level=info msg="Writing out packet"
ts=2023-07-26T20:15:20.81062953Z caller=handler.go:120 module=icmp6 target=a.fi level=debug msg="Setting TTL (IPv6 unprivileged)" ttl=64
ts=2023-07-26T20:15:20.811221011Z caller=handler.go:120 module=icmp6 target=a.fi level=info msg="Waiting for reply packets"
ts=2023-07-26T20:15:49.29300144Z caller=handler.go:120 module=icmp6 target=a.fi level=debug msg="Cannot get Hop Limit from the received packet. 'probe_icmp_reply_hop_limit' will be missing."
ts=2023-07-26T20:15:49.293101733Z caller=handler.go:120 module=icmp6 target=a.fi level=warn msg="Timeout reading from socket" err="read udp [::]:199: raw-read udp [::]:199: i/o timeout"
ts=2023-07-26T20:15:49.293249184Z caller=main.go:181 module=icmp6 target=a.fi level=error msg="Probe failed" duration_seconds=30.089666988

Module configuration:
prober: icmp
timeout: 30s
http:
  ip_protocol_fallback: true
  follow_redirects: true
  enable_http2: true
tcp:
  ip_protocol_fallback: true
icmp:
  preferred_ip_protocol: ip6
  ip_protocol_fallback: true
  ttl: 64
dns:
  ip_protocol_fallback: true
  recursion_desired: true
dswarbrick commented 1 year ago

@samip5 The IO timeout error suggests that the echo replies are not being received by blackbox_exporter, e.g. your router is dropping the outbound echo-request, or dropping / filtering the echo-reply.

Other than that, the debug indicates that blackbox_exporter is successfully creating the listening socket and sending the packet, so your CAP_NET_RAW / net.ipv4.ping_group_range are valid.

samip5 commented 1 year ago

Other than that, the debug indicates that blackbox_exporter is successfully creating the listening socket and sending the packet, so your CAP_NET_RAW / net.ipv4.ping_group_range are valid.

It seems the problem was my CNI (Container Network Interface), but yes it appears to work now. :)