Open gdlx opened 4 weeks ago
I noticed the same, we are having
kube_endpoint_address{namespace="monitoring",endpoint="alertmanager-operated",ip="10.25.119.228",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="alertmanager-operated",ip="10.25.119.228",ready="true"} 1
I do not know if this is related but in our case the endpoint is having this IP twice with different ports. in my case IP 10.25.119.228 listens to port 9094 and 9093.
I do not know if this is related
@eimarfandino Yes, different ports but same issue.
I'm writing to confirm that I'm seeing this on GKE as well. Services with multiple ports bound to the same IP lead to duplicate metrics being exported by kube-state-metrics. For example (IP addresses masked):
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-dncadu41te",ip="xx.xx.xx.xx",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-dncadu41te",ip="xx.xx.xx.xx",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1j2b5u4e7g",ip="yy.yy.yy.yy",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1j2b5u4e7g",ip="yy.yy.yy.yy",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1q8fig66j0",ip="zz.zz.zz.zz",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-1q8fig66j0",ip="zz.zz.zz.zz",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-7eknv6n114",ip="aa.aa.aa.aa",ready="true"} 1
kube_endpoint_address{namespace="monitoring",endpoint="gke-mcs-7eknv6n114",ip="zz.zz.zz.zz",ready="true"} 1
/assign /triage accepted
The kube_endpoint_address
is explicitly for the addresses of an endpoint. The kube_endpoint_ports
metric is for the ports of an endpoint. This would murky the water between these metrics and their purpose.
The other option is to ensure the IPs are unique when generating these metrics. There are a few concerns I have regarding this approach.
.Addresses
) and not ready (.NotReadyAddresses
) at the same time?kube_endpoint_address_available
and kube_endpoint_address_not_ready
would not match the amount of kube_endpoint_address
metrics anymore.If we were to consider adding the port to the kube_endpoint_address
metric, could this open of a conversation about a more generic kube_endpoint
metric more suitable for this? What was the original decision making behind these separate _address
and _ports
metrics for endpoints?
@Serializator That means the clean way would be the prometheus operator not to use the same address for different instances ? That would consume more IPs but avoid this kind of hybrid endpoint subsets.
Hi @gdlx! The Prometheus Operator is not doing anything it shouldn't be doing so I think it's on KSM to support this unforeseen circumstance. The Prometheus Operator is unfortunately the one which brings this problem to light. If it wasn't for the Prometheus Operator it would've been something else.
The bug lies in the fact that we don't distinguish between endpoint subsets. The metric was written in a way where we assumed that addresses and ports would always be unique for a single endpoint and never duplicated between subsets.
I looked a bit at Kubernetes' validation for Endpoints and it allows for duplicates ip/port pairs between subsets: https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/validation/validation.go#L7069-L7092
I think that the only option we have here is to add a subset
label set to the index of the subset in the endpoint.
We could theoretically also replace kube_endpoint_address
and kube_endpoint_ports
by kube_endpoint_subsets
, but both metrics are stable and looking at the validation code, we have no guarantees that two subsets wouldn't contain the same ip/port pair.
Any thoughts @mrueg @rexagod?
We could also add port field to the kube_endpoint_address and mark the port one as deprecated.
After upgrading to Prometheus 2.52, we had some alerts about dropped duplicates samples.
The prometheus log shown the following warning:
Setting the log level to debug shown the concerned series:
Checking the indicated series indeed shown the following duplicates:
The
prometheus-operated
endpoint has the following subsets:We can see the 2 entries on the same IP (
100.91.68.8
) but on different ports. Grpc is enabled only by the Thanos sidecar container, and it's enabled only on one Prometheus instance. I think there wouldn't have been duplicates if both instances had the same config (there would only have been one subset with both addresses and ports).The only way I see to fix this would be to add a
port
label on thekube_endpoint_address
metric. Is there something else I can do or would this be considered as a bug ?Thanks !
Environment:
2.12.0
1.28