kube_endpoint_address duplicates with Prometheus 2.52

gdlx commented 4 weeks ago

After upgrading to Prometheus 2.52, we had some alerts about dropped duplicates samples.

The prometheus log shown the following warning:

scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://100.91.220.12:8080/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=1

Setting the log level to debug shown the concerned series:

scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://100.91.220.12:8080/metrics msg="Duplicate sample for timestamp" series="kube_endpoint_address{namespace=\"monitoring\",endpoint=\"prometheus-operated\",ip=\"100.91.68.8\",ready=\"true\"}"

Checking the indicated series indeed shown the following duplicates:

kube_endpoint_address{namespace="monitoring",endpoint="prometheus-operated",ip="100.91.68.8",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="prometheus-operated",ip="100.91.68.8",ready="true"} 1

The prometheus-operated endpoint has the following subsets:

subsets:
- addresses:
    - ip: 100.91.43.113
      hostname: prometheus-kube-prometheus-istio-0
      nodeName: ip-100-91-48-253.eu-west-3.compute.internal
      targetRef:
        kind: Pod
        namespace: monitoring
        name: prometheus-kube-prometheus-istio-0
        uid: 1180e2a5-75e4-4098-961c-940264115438
    - ip: 100.91.68.8
      hostname: prometheus-kube-prometheus-stack-prometheus-0
      nodeName: ip-100-91-212-113.eu-west-3.compute.internal
      targetRef:
        kind: Pod
        namespace: monitoring
        name: prometheus-kube-prometheus-stack-prometheus-0
        uid: 257bdfed-e2b4-49c7-aaea-1b7bee1a520d
  ports:
    - name: http-web
      port: 9090
      protocol: TCP
- addresses:
    - ip: 100.91.68.8
      hostname: prometheus-kube-prometheus-stack-prometheus-0
      nodeName: ip-100-91-212-113.eu-west-3.compute.internal
      targetRef:
        kind: Pod
        namespace: monitoring
        name: prometheus-kube-prometheus-stack-prometheus-0
        uid: 257bdfed-e2b4-49c7-aaea-1b7bee1a520d
  ports:
    - name: grpc
      port: 10901
      protocol: TCP

We can see the 2 entries on the same IP (100.91.68.8) but on different ports. Grpc is enabled only by the Thanos sidecar container, and it's enabled only on one Prometheus instance. I think there wouldn't have been duplicates if both instances had the same config (there would only have been one subset with both addresses and ports).

The only way I see to fix this would be to add a port label on the kube_endpoint_address metric. Is there something else I can do or would this be considered as a bug ?

Thanks !

Environment:

kube-state-metrics version: 2.12.0
Kubernetes version: 1.28
Cloud provider or hardware configuration: AWS EKS
Other info:

eimarfandino commented 3 weeks ago

I noticed the same, we are having

kube_endpoint_address{namespace="monitoring",endpoint="alertmanager-operated",ip="10.25.119.228",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="alertmanager-operated",ip="10.25.119.228",ready="true"} 1

I do not know if this is related but in our case the endpoint is having this IP twice with different ports. in my case IP 10.25.119.228 listens to port 9094 and 9093.

gdlx commented 3 weeks ago

I do not know if this is related

@eimarfandino Yes, different ports but same issue.

zoopp commented 2 weeks ago

I'm writing to confirm that I'm seeing this on GKE as well. Services with multiple ports bound to the same IP lead to duplicate metrics being exported by kube-state-metrics. For example (IP addresses masked):

kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-dncadu41te",ip="xx.xx.xx.xx",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-dncadu41te",ip="xx.xx.xx.xx",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1j2b5u4e7g",ip="yy.yy.yy.yy",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1j2b5u4e7g",ip="yy.yy.yy.yy",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1q8fig66j0",ip="zz.zz.zz.zz",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1q8fig66j0",ip="zz.zz.zz.zz",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-7eknv6n114",ip="aa.aa.aa.aa",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-7eknv6n114",ip="zz.zz.zz.zz",ready="true"} 1

dgrisonnet commented 2 weeks ago

/assign /triage accepted

Serializator commented 5 days ago

The kube_endpoint_address is explicitly for the addresses of an endpoint. The kube_endpoint_ports metric is for the ports of an endpoint. This would murky the water between these metrics and their purpose.

The other option is to ensure the IPs are unique when generating these metrics. There are a few concerns I have regarding this approach.

Can an IP address with different ports be available (.Addresses) and not ready (.NotReadyAddresses) at the same time?
The metric value of kube_endpoint_address_available and kube_endpoint_address_not_ready would not match the amount of kube_endpoint_address metrics anymore.

If we were to consider adding the port to the kube_endpoint_address metric, could this open of a conversation about a more generic kube_endpoint metric more suitable for this? What was the original decision making behind these separate _address and _ports metrics for endpoints?

gdlx commented 5 days ago

@Serializator That means the clean way would be the prometheus operator not to use the same address for different instances ? That would consume more IPs but avoid this kind of hybrid endpoint subsets.

Serializator commented 4 days ago

Hi @gdlx! The Prometheus Operator is not doing anything it shouldn't be doing so I think it's on KSM to support this unforeseen circumstance. The Prometheus Operator is unfortunately the one which brings this problem to light. If it wasn't for the Prometheus Operator it would've been something else.

dgrisonnet commented 17 hours ago

The bug lies in the fact that we don't distinguish between endpoint subsets. The metric was written in a way where we assumed that addresses and ports would always be unique for a single endpoint and never duplicated between subsets.

I looked a bit at Kubernetes' validation for Endpoints and it allows for duplicates ip/port pairs between subsets: https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L7069-L7092

I think that the only option we have here is to add a subset label set to the index of the subset in the endpoint. We could theoretically also replace kube_endpoint_address and kube_endpoint_ports by kube_endpoint_subsets, but both metrics are stable and looking at the validation code, we have no guarantees that two subsets wouldn't contain the same ip/port pair.

Any thoughts @mrueg @rexagod?

mrueg commented 16 hours ago

We could also add port field to the kube_endpoint_address and mark the port one as deprecated.

kubernetes / kube-state-metrics

kube_endpoint_address duplicates with Prometheus 2.52 #2408