Closed darkk closed 4 years ago
Those docs seem to be confusing UDP and ICMP.
Maybe the doc is wrong, maybe it says "udp" as IPPROTO_ICMP socket is created with SOCK_DGRAM as a second argument (like UDP socket using 0
instead of IPPROTO_ICMP
as a third arg), but golang/go#9166 explicitly tells that the feature is implemented.
The docs are definitely wrong, the only reference to ICMP is for privileged sockets which means root.
I think we need more clarity here, and knowing which kernels support this.
which kernels support this
AFAIK, it's mainlined since v2.6.39.
I have not figured out earliest MacOSX version supporting socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP)
but it was already supported when the patch was merged to Linux mainline.
So introduced just under 6 years ago, that's relatively new as kernel features go.
The docs are definitely wrong
The docs match test code at https://github.com/golang/net/blob/master/icmp/ping_test.go#L61
The docs indicate you need privileged access to use this feature: "For privileged raw ICMP endpoints, network must be "ip4" or "ip6" followed by a colon and an ICMP protocol number or name."
The part of the doc that is relevant to the ticket is located a couple of paragraphs above:
For non-privileged datagram-oriented ICMP endpoints, network must be "udp4" or "udp6". The endpoint allows to read, write a few limited ICMP messages such as echo request and echo reply. Currently only Darwin and Linux support this.
It also needs some privileges, running process should be in the group within net.ipv4.ping_group_range
, but it's much lower amount of privileges than cap_net_raw
capability.
Please, take a look at icmp/ping_test.go and icmp/listen_posix.go before repeating that docs confuse UDP and ICMP.
For non-privileged datagram-oriented ICMP endpoints, network must be "udp4" or "udp6".
This line as written confuses ICMP and UDP, and the following example only mentions UDP. My interpretation is that this is probably a typo and that ICMP should be UDP.
Please, take a look at icmp/ping_test.go and icmp/listen_posix.go before repeating that docs confuse UDP and ICMP.
If I need to read source code to see what the docs actually mean, then the docs are confusing and/or wrong.
It also needs some privileges, running process should be in the group within net.ipv4.ping_group_range, but it's much lower amount of privileges than cap_net_raw capability.
That's not of too much use then. A feature that only works on newer kernels and requires additional setup doesn't win over SUID or NET_ADMIN which work ~everywhere.
We should document both the setcap CAP_NET_RAW
and sysctl net.ipv4.ping_group_range
methods of giving the exporter access to the privileges needed.
The only major distribution with a kernel older than 2.6.39 is RHEL6. RHEL7 has been out since 2014.
and sysctl net.ipv4.ping_group_range methods of giving the exporter access to the privileges needed.
This would also require code changes, this is a different API.
Right, but we could handle that with fallback detection.
The only major distribution with a kernel older than 2.6.39 is RHEL6.
Centos 6.9 was released last month, is supported until 2020 and comes with 2.6.32.
We have users on older systems (including at least one that can't even run Go out of the box as their kernel is so old). I'm wary of adding features that require tweaking sysctls to work, and don't work for everyone as that's a non-trivial amount of cognitive overhead. Users will want this to just work out of the box, and I suspect this will also be Fun with containers.
We already have two documented ways to make this work on Linux, why should we add a third that doesn't work for everyone?
Users already have to tweak setcap
or setuid
, so adding a 3rd option isn't any different. The newer variation on the syscal to send ICMP packets is safer, as raw socket access is not required. The goal here is to reduce the attack surface area.
@brian-brazil So, after reading the docs, and the source, I think I understand your confusion.
There are two modes of operation unprivileged, and privileged for ICMP ListenPacket. The u
in udp4
doesn't stand for UDP, it stands for "Unprivileged".
The docs aren't wrong, they're just a little confusing since they aren't verbose about what the connection type strings stand for.
Either way, that's just a distraction from the real issue. We should attempt to use unprivileged ListenPacket and fall back to privileged automatically.
+1
FWIW, I was able to finally get non-root ICMP pings working with blackbox-exporter. The key was setting net.ipv4.ping_group_range as part of the pod security context. No other combinations of adding NET_RAW, groups, custom containers that had setcap cap_net_raw+ep on the binary worked (except for running as root).
net.ipv4.ping_group_range is namespaced so that changing it as part of the pod won't affect other parts of the system.
My testing was done with blackbox exporter 0.19.0, CentOS 7, kernel 3.10.0-1160.31.1.el7.x86_64, and Kubernetes 1.21.2.
apiVersion: apps/v1
kind: Deployment
metadata:
name: blackbox-exporter
spec:
replicas: 6
selector:
matchLabels:
app: blackbox-exporter
template:
metadata:
labels:
app: blackbox-exporter
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- blackbox-exporter
topologyKey: "kubernetes.io/hostname"
securityContext:
sysctls:
- name: net.ipv4.ping_group_range
value: "0 2147483647"
containers:
- name: blackbox-exporter
image: docker.io/prom/blackbox-exporter:v0.19.0
ports:
- name: metrics
containerPort: 9115
protocol: TCP
volumeMounts:
- name: config
mountPath: /etc/blackbox_exporter
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
securityContext:
runAsUser: 49172
runAsGroup: 49172
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
volumes:
- name: config
configMap:
name: blackbox-exporter
Working values.yaml below (plus an ingress). Appreciate it's long but it should work in 2024...
global:
## Global image registry to use if it needs to be overriden for some specific use cases (e.g local registries, custom images, ...)
##
imageRegistry: ""
restartPolicy: Always
kind: Deployment
## Override the namespace
##
namespaceOverride: ""
# Override Kubernetes version if your distribution does not follow semver v2
kubeVersionOverride: ""
## set to true to add the release label so scraping of the servicemonitor with kube-prometheus-stack works out of the box
releaseLabel: false
podDisruptionBudget: {}
# maxUnavailable: 0
## Allow automount the serviceaccount token for sidecar container (eg: oauthproxy)
automountServiceAccountToken: false
## Additional blackbox-exporter container environment variables
## For instance to add a http_proxy
##
## extraEnv:
## HTTP_PROXY: "http://superproxy.com:3128/"
## NO_PROXY: "localhost,127.0.0.1"
extraEnv: {}
## Additional blackbox-exporter container environment variables for secret
## extraEnvFromSecret:
## - secretOne
## - secretTwo
extraEnvFromSecret: ""
extraVolumes: []
# - name: secret-blackbox-oauth-htpasswd
# secret:
# defaultMode: 420
# secretName: blackbox-oauth-htpasswd
# - name: storage-volume
# persistentVolumeClaim:
# claimName: example
## Additional volumes that will be attached to the blackbox-exporter container
extraVolumeMounts:
# - name: ca-certs
# mountPath: /etc/ssl/certs/ca-certificates.crt
## Additional InitContainers to initialize the pod
## This supports either a structured array or a templatable string
extraInitContainers: []
## This supports either a structured array or a templatable string
# Array mode
extraContainers: []
# - name: oAuth2-proxy
# args:
# - -https-address=:9116
# - -upstream=http://localhost:9115
# - -skip-auth-regex=^/metrics
# - -openshift-delegate-urls={"/":{"group":"monitoring.coreos.com","resource":"prometheuses","verb":"get"}}
# image: openshift/oauth-proxy:v1.1.0
# ports:
# - containerPort: 9116
# name: proxy
# resources:
# limits:
# memory: 16Mi
# requests:
# memory: 4Mi
# cpu: 20m
# volumeMounts:
# - mountPath: /etc/prometheus/secrets/blackbox-tls
# name: secret-blackbox-tls
# String mode
# extraContainers: |-
# - name: oAuth2-proxy
# args:
# - -https-address=:9116
# - -upstream=http://localhost:9115
# - -skip-auth-regex=^/metrics
# - -openshift-delegate-urls={"/":{"group":"monitoring.coreos.com","resource":"prometheuses","verb":"get"}}
# image: {{ .Values.global.imageRegistry }}/openshift/oauth-proxy:v1.1.0
## Enable pod security policy
pspEnabled: true
hostNetwork: false
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
image:
registry: quay.io
repository: prometheus/blackbox-exporter
# Overrides the image tag whose default is {{ printf "v%s" .Chart.AppVersion }}
tag: ""
pullPolicy: IfNotPresent
digest: ""
## Optionally specify an array of imagePullSecrets.
## Secrets must be manually created in the namespace.
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
##
# pullSecrets:
# - myRegistrKeySecretName
podSecurityContext:
sysctls:
- name: net.ipv4.ping_group_range
value: "0 2147483647"
# fsGroup: 1000
## User and Group to run blackbox-exporter container as
securityContext:
runAsUser: 1000
runAsGroup: 1000
readOnlyRootFilesystem: true
runAsNonRoot: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
# Add NET_RAW to enable ICMP
add: ["NET_RAW"]
livenessProbe:
httpGet:
path: /-/healthy
port: http
failureThreshold: 3
readinessProbe:
httpGet:
path: /-/healthy
port: http
nodeSelector: {}
tolerations: []
affinity: {}
## Topology spread constraints rely on node labels to identify the topology domain(s) that each Node is in.
## Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/
topologySpreadConstraints: []
# - maxSkew: 1
# topologyKey: failure-domain.beta.kubernetes.io/zone
# whenUnsatisfiable: DoNotSchedule
# labelSelector:
# matchLabels:
# app.kubernetes.io/instance: jiralert
# if the configuration is managed as secret outside the chart, using SealedSecret for example,
# provide the name of the secret here. If secretConfig is set to true, configExistingSecretName will be ignored
# in favor of the config value.
configExistingSecretName: ""
# Store the configuration as a `Secret` instead of a `ConfigMap`, useful in case it contains sensitive data
secretConfig: false
config:
modules:
http_2xx:
prober: http
timeout: 5s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
follow_redirects: true
preferred_ip_protocol: "ip4"
icmp:
prober: icmp
icmp:
preferred_ip_protocol: ip4
allowIcmp: true
# Set custom config path, other than default /config/blackbox.yaml. If let empty, path will be "/config/blackbox.yaml"
# configPath: "/foo/bar"
extraConfigmapMounts: []
# - name: certs-configmap
# mountPath: /etc/secrets/ssl/
# subPath: certificates.crt # (optional)
# configMap: certs-configmap
# readOnly: true
# defaultMode: 420
## Additional secret mounts
# Defines additional mounts with secrets. Secrets must be manually created in the namespace.
extraSecretMounts: []
# - name: secret-files
# mountPath: /etc/secrets
# secretName: blackbox-secret-files
# readOnly: true
# defaultMode: 420
resources: {}
# limits:
# memory: 300Mi
# requests:
# memory: 50Mi
priorityClassName: ""
service:
annotations: {}
labels: {}
type: ClusterIP
port: 9115
ipDualStack:
enabled: false
ipFamilies: ["IPv6", "IPv4"]
ipFamilyPolicy: "PreferDualStack"
# Only changes container port. Application port can be changed with extraArgs (--web.listen-address=:9115)
# https://github.com/prometheus/blackbox_exporter/blob/998037b5b40c1de5fee348ffdea8820509d85171/main.go#L55
containerPort: 9115
# Number of port to expose on the host. If specified, this must be a valid port number, 0 < x < 65536. If zero, no port is exposed.
# This is useful for communicating with Daemon Pods when kind is DaemonSet.
hostPort: 0
serviceAccount:
# Specifies whether a ServiceAccount should be created
create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name:
annotations: {}
## An Ingress resource can provide name-based virtual hosting and TLS
## termination among other things for CouchDB deployments which are accessed
## from outside the Kubernetes cluster.
## ref: https://kubernetes.io/docs/concepts/services-networking/ingress/
ingress:
enabled: true
className: "nginx"
labels: {}
annotations:
kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
## The host property on hosts and tls is passed through helm tpl function.
## ref: https://helm.sh/docs/developing_charts/#using-the-tpl-function
- host: blackbox-exporter.core.example.net
paths:
- path: /
pathType: ImplementationSpecific
- host: blackbox-exporter
paths:
- path: /
pathType: ImplementationSpecific
tls:
- secretName: cert-blackbox-exporter.core.example.net
hosts:
- blackbox-exporter.core.example.net
- blackbox-exporter
podAnnotations: {}
# Annotations for the Deployment
deploymentAnnotations: {}
# Annotations for the Secret
secretAnnotations: {}
# Hostaliases allow to add additional DNS entries to be injected directly into pods.
# This will take precedence over your implemented DNS solution
hostAliases: []
# - ip: 192.168.1.1
# hostNames:
# - test.example.com
# - another.example.net
pod:
labels: {}
extraArgs: []
# - --history.limit=1000
replicas: 1
serviceMonitor:
## If true, a ServiceMonitor CRD is created for a prometheus operator
## https://github.com/coreos/prometheus-operator for blackbox-exporter itself
##
selfMonitor:
enabled: false
additionalMetricsRelabels: {}
additionalRelabeling: []
labels: {}
path: /metrics
scheme: http
tlsConfig: {}
interval: 30s
scrapeTimeout: 30s
## Port can be defined by assigning a value for the port key below
## port:
## If true, a ServiceMonitor CRD is created for a prometheus operator
## https://github.com/coreos/prometheus-operator for each target
##
enabled: false
# Default values that will be used for all ServiceMonitors created by `targets`
defaults:
additionalMetricsRelabels: {}
additionalRelabeling: []
labels: {}
interval: 30s
scrapeTimeout: 30s
module: http_2xx
## scheme: HTTP scheme to use for scraping. Can be used with `tlsConfig` for example if using istio mTLS.
scheme: http
## path: HTTP path. Needs to be adjusted, if web.route-prefix is set
path: "/probe"
## tlsConfig: TLS configuration to use when scraping the endpoint. For example if using istio mTLS.
## Of type: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig
tlsConfig: {}
bearerTokenFile:
targets:
# - name: example # Human readable URL that will appear in Prometheus / AlertManager
# url: http://example.com/healthz # The URL that blackbox will scrape
# hostname: example.com # HTTP probes can accept an additional `hostname` parameter that will set `Host` header and TLS SNI
# labels: {} # Map of labels for ServiceMonitor. Overrides value set in `defaults`
# interval: 60s # Scraping interval. Overrides value set in `defaults`
# scrapeTimeout: 60s # Scrape timeout. Overrides value set in `defaults`
# module: http_2xx # Module used for scraping. Overrides value set in `defaults`
# additionalMetricsRelabels: {} # Map of metric labels and values to add
# additionalRelabeling: [] # List of metric relabeling actions to run
## Custom PrometheusRules to be defined
## ref: https://github.com/coreos/prometheus-operator#customresourcedefinitions
prometheusRule:
enabled: false
additionalLabels: {}
namespace: ""
rules: []
podMonitoring:
## If true, a PodMonitoring CR is created for google managed prometheus
## https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#gmp-pod-monitoring for blackbox-exporter itself
##
selfMonitor:
enabled: false
additionalMetricsRelabels: {}
labels: {}
path: /metrics
interval: 30s
scrapeTimeout: 30s
## If true, a PodMonitoring CR is created for a google managed prometheus
## https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#gmp-pod-monitoring for each target
##
enabled: false
## Default values that will be used for all PodMonitoring created by `targets`
## Following PodMonitoring API specs https://github.com/GoogleCloudPlatform/prometheus-engine/blob/main/doc/api.md#scrapeendpoint
defaults:
additionalMetricsRelabels: {}
labels: {}
interval: 30s
scrapeTimeout: 30s
module: http_2xx
## scheme: Protocol scheme to use to scrape.
scheme: http
## path: HTTP path. Needs to be adjusted, if web.route-prefix is set
path: "/probe"
## tlsConfig: TLS configuration to use when scraping the endpoint. For example if using istio mTLS.
## Of type: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig
tlsConfig: {}
targets:
# - name: example # Human readable URL that will appear in Google Managed Prometheus / AlertManager
# url: http://example.com/healthz # The URL that blackbox will scrape
# hostname: example.com # HTTP probes can accept an additional `hostname` parameter that will set `Host` header and TLS SNI
# labels: {} # Map of labels for PodMonitoring. Overrides value set in `defaults`
# interval: 60s # Scraping interval. Overrides value set in `defaults`
# scrapeTimeout: 60s # Scrape timeout. Overrides value set in `defaults`
# module: http_2xx # Module used for scraping. Overrides value set in `defaults`
# additionalMetricsRelabels: {} # Map of metric labels and values to add
## Network policy for chart
networkPolicy:
# Enable network policy and allow access from anywhere
enabled: false
# Limit access only from monitoring namespace
# Before setting this value to true, you must add the name=monitoring label to the monitoring namespace
# Network Policy uses label filtering
allowMonitoringNamespace: false
## dnsPolicy and dnsConfig for Deployments and Daemonsets if you want non-default settings.
## These will be passed directly to the PodSpec of same.
dnsPolicy:
dnsConfig:
# Extra manifests to deploy as an array
extraManifests: []
# - apiVersion: v1
# kind: ConfigMap
# metadata:
# labels:
# name: prometheus-extra
# data:
# extra-data: "value"
# global common labels, applied to all ressources
commonLabels: {}
# Enable vertical pod autoscaler support for prometheus-blackbox-exporter
verticalPodAutoscaler:
enabled: false
# Recommender responsible for generating recommendation for the object.
# List should be empty (then the default recommender will generate the recommendation)
# or contain exactly one recommender.
# recommenders:
# - name: custom-recommender-performance
# List of resources that the vertical pod autoscaler can control. Defaults to cpu and memory
controlledResources: []
# Specifies which resource values should be controlled: RequestsOnly or RequestsAndLimits.
# controlledValues: RequestsAndLimits
# Define the max allowed resources for the pod
maxAllowed: {}
# cpu: 200m
# memory: 100Mi
# Define the min allowed resources for the pod
minAllowed: {}
# cpu: 200m
# memory: 100Mi
updatePolicy:
# Specifies minimal number of replicas which need to be alive for VPA Updater to attempt pod eviction
# minReplicas: 1
# Specifies whether recommended updates are applied when a Pod is started and whether recommended updates
# are applied during the life of a Pod. Possible values are "Off", "Initial", "Recreate", and "Auto".
updateMode: Auto
configReloader:
enabled: false
containerPort: 8080
config:
logFormat: logfmt
logLevel: info
watchInterval: 1m
image:
registry: quay.io
repository: prometheus-operator/prometheus-config-reloader
tag: "v0.71.2"
pullPolicy: IfNotPresent
digest: ""
securityContext:
runAsUser: 1000
runAsGroup: 1000
readOnlyRootFilesystem: true
runAsNonRoot: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
resources:
limits:
memory: 50Mi
requests:
cpu: 10m
memory: 20Mi
livenessProbe:
httpGet:
path: /healthz
port: reloader-web
scheme: HTTP
readinessProbe:
httpGet:
path: /healthz
port: reloader-web
scheme: HTTP
service:
port: 8080
serviceMonitor:
selfMonitor:
additionalMetricsRelabels: {}
additionalRelabeling: []
path: /metrics
scheme: http
tlsConfig: {}
interval: 30s
scrapeTimeout: 30s
Looks like it doesn't work in 2024, Had to go with root to make it work
x/net/icmp supports root-less operation for ICMP pings on Linux and MacOSX, but blackbox_exporter requires elevated privileges for that.
Are there any non-obvious blockers for using rootless ping sockets? I've looked at the code and I've not noticed any.