kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.21k stars 1.93k forks source link

Info and StateSet types cause errors when scrapped by Prometheus #2248

Closed defenestration closed 3 months ago

defenestration commented 7 months ago

What happened: When kube-state-metrics is configured with custom resources that produce Info and/or StateSet type of metrics, and Prometheus is configured with --enable-feature=native-histograms, prometheus is unable to scrape the kube-state-metrics target.

Filed an issue with prometheus here https://github.com/prometheus/prometheus/issues/13172 but seems like a possible issue with KSM as well.

What you expected to happen:

Successful scraping of metrics.

How to reproduce it (as minimally and precisely as possible):

Run prometheus with --enable-feature=native-histograms and setup kube-state-metrics with a metric of type: Info or Stateset.

See a https://github.com/fluxcd/flux2-monitoring-example/issues/14 or https://github.com/defenestration/flux2-monitoring-example/pull/1 for an example setup.

Anything else we need to know?:

With the native-histograms feature disabled, prometheus seems to be able to scrape successfully. So a bug is also filed with prometheus.

I guess the suggestion would be to include a setting on the custom resource to produce a promehteus-compatible type on the /metrics page for these custom resource types.

Environment:

CatherineF-dev commented 7 months ago

Could you provide detailed steps on reproducing this issue? You can follow https://github.com/kubernetes/kube-state-metrics/issues/2223#issuecomment-1792850276

dgrisonnet commented 7 months ago

The problem seem to be that when Prometheus has --native-histograms enabled, it tries to negociate for protobuf first and we manually force the text format instead since proto isn't supported: https://github.com/kubernetes/kube-state-metrics/blob/240cffd908220854a27f7e92d8157eaee4dc8d42/pkg/metricshandler/metrics_handler.go#L191-L194.

However, there doesn't seem to be any mechanism in custom-state-metrics to revert back to Prometheus types, so we continue to expose OpenMetrics types whilst the content-type is text.

I think we should rework https://github.com/kubernetes/kube-state-metrics/pull/1930 to enable the OpenMetrics types only when the OpenMetrics format is negociated.

mrueg commented 7 months ago

Yes we can definitely improve the negotiation part and contributions that do that are welcome. My current suggestion for to workaround this is not to use Info or StateSet types in your custom config for CRS.

dgrisonnet commented 7 months ago

/assign @rexagod /triage accepted

rexagod commented 6 months ago

IIUC All unrecognized types are categorized as COUNTERs (also, here), which are validated later on when consuming them in common, which is where this error pops up from.

As discussed with Damien in a call earlier, the best way forward to bypass the absence of certain OpenMetrics types in Prometheus, would be to detect if a Proto negotiation is in effect, and try to specify a text-based format, since owing to the sheer amount of metrics that KSM generates, encoding these into metrics objects to facilitate things for KSM's own protobuf client would increase compute in a way that demands rigorous benchmarking beforehand.

EDIT: This detect-then-enforce approach would keep things stable for dashboards, etc., that have one-or-more use-cases for the HELP and TYPE texts. For eg., without this, users may end up seeing gauge for their info or stateset metrics. Additionally, both of these types are used within CRS, and not in native metrics definitions.

I'm wondering we might want to expose the scrape_protocols API in https://github.com/kubernetes/component-base/tree/master/metrics/prometheus and then bring that here (or introduce this here for now and replace that once we have an impl. in k/k, since we'll be needing that for https://github.com/kubernetes/kubernetes/pull/119949 anyway.

rexagod commented 6 months ago

Ah, scrape_protocols is a client-side API, I imagine this renders it irrelevant for the use-case here, although, we can still manually set headers from the supported list of text-based formats (as we do right now albeit not with protobufs in consideration).

So essentially, if the requested format is text-based, do nothing, else s/info|stateset/gauge.