Open smbambling opened 1 year ago
Updating the container to run as root DOES allow the blackbox exporter to bind to the IPv6 socker, and I can use the ping utility as well to verify
My values for testing
pspEnabled: false
podSecurityContext:
fsGroup: 1000
sysctls:
- name: net.ipv4.ping_group_range
value: "0 2147483647"
securityContext:
runAsUser:
runAsGroup:
runAsNonRoot: false
allowPrivilegeEscalation: true
readOnlyRootFilesystem: false
capabilities:
add:
- NET_ADMIN
- NET_RAW
drop: []
Ping output
/tmp # ifconfig
eth0 Link encap:Ethernet HWaddr 7E:6A:A4:51:17:C4
inet addr:10.42.0.144 Bcast:10.42.0.255 Mask:255.255.255.0
inet6 addr: fe80::7c6a:a4ff:fe51:17c4/64 Scope:Link
inet6 addr: fc15:1::186/64 Scope:Global
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:1929 errors:0 dropped:0 overruns:0 frame:0
TX packets:1815 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1188146 (1.1 MiB) TX bytes:709224 (692.6 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
/tmp # ping6 2001:500:4:201::47
PING 2001:500:4:201::47 (2001:500:4:201::47): 56 data bytes
64 bytes from 2001:500:4:201::47: seq=0 ttl=55 time=3.515 ms
64 bytes from 2001:500:4:201::47: seq=1 ttl=55 time=3.345 ms
Even running the container as root the cap_net_raw
capability is required in order to allow IPv6 to bind to the socket for ICMP ping request.
Running blackbox_exporter as root shouldn't require explicitly granting it cap_net_raw
, since user id 0 is permitted to use raw sockets anyway. To keep the permissions more granular however, running non-root but with cap_net_raw
is sufficient to ping both IPv4 and IPv6 targets.
Configuring net.ipv4.ping_group_range
allows members of the specified groups to send ICMP / ICMPv6 echo packets, without needing root or cap_net_raw
. It is even finer-grained than being able to send arbitrary raw IP packets, since it only permits IPPROTO_ICMP / IPPROTO_ICMPV6, as opposed to IPPROTO_RAW.
blackbox_exporter attempts to use unprivileged ping sockets on darwin and linux (as would be permitted by net.ipv4.ping_group_range
), and falls back to traditional privileged pings requiring user id 0 or cap_net_raw
.
The kernel commit which expanded the scope of net.ipv4.ping_group_range
to also allow IPv6 pings is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net?id=6d0bfe22611602f36617bc7aa2ffa1bbb2f54c67, which was first included in kernel version 3.11. You would need to research whether this was backported to Centos' 3.10 kernel. If you are finding that you still need to specify cap_net_raw
to get IPv6 pings working, it sounds like the Centos 3.10 kernel has not backported this commit.
I'm also facing this issue, and I'm running kernel 6.1.0 on Debian 12.
For reference, ping6 on the container will also fail with permission denied with NET_ADMIN and NET_RAW caps:
~ $ ping6 google.com
PING google.com (2a00:1450:4026:804::200e): 56 data bytes
ping6: permission denied (are you root?)
Current values are:
fullnameOverride: blackbox-exporter
image:
registry: quay.io
podSecurityContext:
sysctls:
- name: net.ipv4.ping_group_range
value: "0 2147483647"
config:
modules:
http_2xx:
prober: http
timeout: 5s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
follow_redirects: true
preferred_ip_protocol: "ip4"
icmp4:
prober: icmp
timeout: 30s
icmp:
preferred_ip_protocol: "ip4"
icmp6:
prober: icmp
timeout: 30s
icmp:
preferred_ip_protocol: "ip6"
prometheusRule:
enabled: true
additionalLabels:
app: prometheus-operator
release: prometheus
rules:
- alert: BlackboxSslCertificateWillExpireSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 3
for: 15m
labels:
severity: critical
annotations:
description: |-
The SSL certificate for {{"{{ $labels.target }}"}} will expire in less than 3 days
- alert: BlackboxSslCertificateExpired
expr: probe_ssl_earliest_cert_expiry - time() <= 0
for: 15m
labels:
severity: critical
annotations:
description: |-
The SSL certificate for {{"{{ $labels.target }}"}} has expired
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 15m
labels:
severity: critical
annotations:
description: |-
The host {{"{{ $labels.target }}"}} is currently unreachable
pspEnabled: false
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
add:
- NET_ADMIN
- NET_RAW
serviceMonitor:
enabled: true
defaults:
labels:
release: prometheus
interval: 1m
scrapeTimeout: 30s
targets:
# Other devices
- module: icmp4
name: zigbee-controller-icmp
url: 192.168.2.112
- module: icmp4
name: nas-icmp
url: 192.168.2.2
- module: icmp4
name: ping-cloudflare
url: 1.1.1.1
scrape_interval: 30s
- module: icmp6
name: ping6-aroot-fi
url: a.fi
scrape_interval: 30s
@samip5 It would be helpful if you could include the output of a probe with &debug=true
.
@samip5 It would be helpful if you could include the output of a probe with
&debug=true
.
Logs for the probe:
ts=2023-07-26T20:15:19.153166886Z caller=main.go:181 module=icmp6 target=a.fi level=info msg="Beginning probe" probe=icmp timeout_seconds=30
ts=2023-07-26T20:15:19.203719654Z caller=icmp.go:91 module=icmp6 target=a.fi level=info msg="Resolving target address" target=a.fi ip_protocol=ip6
ts=2023-07-26T20:15:20.765398623Z caller=icmp.go:91 module=icmp6 target=a.fi level=info msg="Resolved target address" target=a.fi ip=2001:708:10:53::53
ts=2023-07-26T20:15:20.765476243Z caller=handler.go:120 module=icmp6 target=a.fi level=info msg="Creating socket"
ts=2023-07-26T20:15:20.777213805Z caller=handler.go:120 module=icmp6 target=a.fi level=info msg="Creating ICMP packet" seq=58042 id=50125
ts=2023-07-26T20:15:20.810597242Z caller=handler.go:120 module=icmp6 target=a.fi level=info msg="Writing out packet"
ts=2023-07-26T20:15:20.81062953Z caller=handler.go:120 module=icmp6 target=a.fi level=debug msg="Setting TTL (IPv6 unprivileged)" ttl=64
ts=2023-07-26T20:15:20.811221011Z caller=handler.go:120 module=icmp6 target=a.fi level=info msg="Waiting for reply packets"
ts=2023-07-26T20:15:49.29300144Z caller=handler.go:120 module=icmp6 target=a.fi level=debug msg="Cannot get Hop Limit from the received packet. 'probe_icmp_reply_hop_limit' will be missing."
ts=2023-07-26T20:15:49.293101733Z caller=handler.go:120 module=icmp6 target=a.fi level=warn msg="Timeout reading from socket" err="read udp [::]:199: raw-read udp [::]:199: i/o timeout"
ts=2023-07-26T20:15:49.293249184Z caller=main.go:181 module=icmp6 target=a.fi level=error msg="Probe failed" duration_seconds=30.089666988
Module configuration:
prober: icmp
timeout: 30s
http:
ip_protocol_fallback: true
follow_redirects: true
enable_http2: true
tcp:
ip_protocol_fallback: true
icmp:
preferred_ip_protocol: ip6
ip_protocol_fallback: true
ttl: 64
dns:
ip_protocol_fallback: true
recursion_desired: true
@samip5 The IO timeout error suggests that the echo replies are not being received by blackbox_exporter, e.g. your router is dropping the outbound echo-request, or dropping / filtering the echo-reply.
Other than that, the debug indicates that blackbox_exporter is successfully creating the listening socket and sending the packet, so your CAP_NET_RAW
/ net.ipv4.ping_group_range
are valid.
Other than that, the debug indicates that blackbox_exporter is successfully creating the listening socket and sending the packet, so your
CAP_NET_RAW
/net.ipv4.ping_group_range
are valid.
It seems the problem was my CNI (Container Network Interface), but yes it appears to work now. :)
Host operating system: output of
uname -a
CentOS Linux release 7.9.2009 (Core) Linux prom1.example 3.10.0-1160.66.1.el7.x86_64 #1 SMP Wed May 18 16:02:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
blackbox_exporter version: output of
blackbox_exporter --version
Running on K3S via the kube-prometheus-stack helm chart
blackbox_exporter, version 0.23.0 (branch: HEAD, revision: 26fc98b9c6db21457653ed752f34d1b7fb5bba43) build user: root@f360719453e3 build date: 20221202-12:26:32 go version: go1.19.3 platform: linux/amd64
What did you do that produced an error?
Logs for the probe: ts=2023-02-06T11:25:27.829923503Z caller=main.go:181 module=ping_v6 target=2001:500:110:affe::249 level=info msg="Beginning probe" probe=icmp timeout_seconds=5 ts=2023-02-06T11:25:27.83021627Z caller=icmp.go:91 module=ping_v6 target=2001:500:110:affe::249 level=info msg="Resolving target address" target=2001:500:110:affe::249 ip_protocol=ip6 ts=2023-02-06T11:25:27.830297445Z caller=icmp.go:91 module=ping_v6 target=2001:500:110:affe::249 level=info msg="Resolved target address" target=2001:500:110:affe::249 ip=2001:500:110:affe::249 ts=2023-02-06T11:25:27.830334595Z caller=handler.go:117 module=ping_v6 target=2001:500:110:affe::249 level=info msg="Creating socket" ts=2023-02-06T11:25:27.841335002Z caller=handler.go:117 module=ping_v6 target=2001:500:110:affe::249 level=debug msg="Unable to do unprivileged listen on socket, will attempt privileged" err="socket: protocol not supported" ts=2023-02-06T11:25:27.84154106Z caller=handler.go:117 module=ping_v6 target=2001:500:110:affe::249 level=error msg="Error listening to socket" err="listen ip6:ipv6-icmp ::: socket: operation not permitted" ts=2023-02-06T11:25:27.841592441Z caller=main.go:181 module=ping_v6 target=2001:500:110:affe::249 level=error msg="Probe failed" duration_seconds=0.01156632
Following the documentation at https://github.com/prometheus/blackbox_exporter#permissions I've set various combinations of capabilities (NET_ADMIN, NET_RAW) and the sysctl
net.ipv4.ping_group_range
. I still get a failure with IPv6 when using both** Only setting the cap_net_raw failed to grant the correct permissions for any ICMP requests. However after setting that value IPv4 ICMP request were correctly working.
It appears that
net.ipv4.ping_group_range
should apply to both IPv4 and IPv6 https://bugzilla.redhat.com/show_bug.cgi?id=1315335#c2