kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.21k stars 1.93k forks source link

fix: support empty groups again, e.g. Node #2435

Open robbat2 opened 4 days ago

robbat2 commented 4 days ago

PR#1851 introduced accidental breakage for custom metrics on resources with an empty empty, e.g. Node.

Fix it, and add tests to catch the next breakage.

Reference: https://github.com/kubernetes/kube-state-metrics/pull/1851 Reference: https://github.com/kubernetes/kube-state-metrics/issues/2434 Signed-off-by: Robin H. Johnson rjohnson@coreweave.com

demo_timestamp{customresource_group="",customresource_kind="Node",customresource_version="v1",node="kind-control-plane"} 1.719618433e+09

What this PR does / why we need it: Fixes breakage introduced in v2.9.0

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality) Does not change cardinality.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #2434

k8s-ci-robot commented 4 days ago

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 4 days ago

Welcome @robbat2!

It looks like this is your first PR to kubernetes/kube-state-metrics 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kube-state-metrics has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot commented 4 days ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: robbat2 Once this PR has been reviewed and has the lgtm label, please assign mrueg for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubernetes/kube-state-metrics/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
dgrisonnet commented 1 day ago

All custom resources have an API group. Resources who don't are native Kubernetes resources for which kube-state-metrics already provides a safe set of metrics that is frequently updated to support new users requests. The ability to extend the metrics exposed by kube-state-metrics was never meant for natives resources and that is not something we wanted to support going forward.

Some additional context about that decision was added in https://github.com/kubernetes/kube-state-metrics/issues/2044.

rexagod commented 20 hours ago

Exactly what Damien said.

Additionally, PTAL at the docs, https://github.com/kubernetes/kube-state-metrics/pull/1851/files#diff-1e83b132955ba4b76949317660faf1fa222f0ee999d4fb3b4fb9dd318b7ac2beR521.

daveoy commented 19 hours ago

Could you point me in the direction of docs showing functionality such as this, then?

          - groupVersionKind:
              kind: Node
              version: v1
            labelsFromPath:
              node:
              - metadata
              - name
            metricNamePrefix: ""
            metrics:
            - name: last_timestamp_of_thing_that_happened
              help: last timestamp thing
              each:
                gauge:
                  path:
                  - metadata
                  - annotations
                  valueFrom:
                  - timestamp-value-annotation
                type: Gauge

this would twist the dimensions of a label from a node resource to a timestamp gauge value in a new metric which is extremely useful, in addition to it being both possible AND encouraged for custom resources. seems like a bit of a shame to not allow users to do this sort of thing with core/builtin resources?

rexagod commented 19 hours ago

Native resources are supported in-tree only, by design, and thus cannot leak into the CRS (Custom Resource State) machinery. If you believe something needs to be added for them, please open a PR, just as the docs say in the aforementioned link. I'd be happy to take a look.

daveoy commented 19 hours ago

just so im clear, this functionality isn't available for native resources somewhere else within ksm and we've just been trying to shoehorn the custom resource version of it into native resources?

is there any plan to replicate this functionality for native resources? or are we limited to the metrics and their dimensions as provided by this project for native resources?

again, it seems a bit of a waste of great available functionality "restricting" this sort of metrics manipulation to only custom resources

robbat2 commented 18 hours ago

As @daveoy notes, one of our use cases is exporting a timestamp value from a Node annotation. This is not possible via KSM since v2.9.0. It was previously available and worked in in v2.8.2 - from that perspective you broke existing functionality.

Another use case we have is splitting out a very large set of Node labels.

We have Nodes with 100+ labels, over multiple namespaces, and even with metric-labels-allowlist mechanism, this causes cardinality problems in Prometheus.

We want to export each of those namespaces of labels as a distinct metric - to reduce the complexity of kube_node_labels, as the metrics consumers don't need all of them, just their own namespaces.

mrueg commented 17 hours ago

As @daveoy notes, one of our use cases is exporting a timestamp value from a Node annotation. This is not possible via KSM since v2.9.0. It was previously available and worked in in v2.8.2 - from that perspective you broke existing functionality.

The custom resource metrics feature is still experimental (See: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/README.md#metrics-from-custom-resources ) so functionality might change over time as the feature evolves.

Please do not rely on an experimental feature in production systems.

robbat2 commented 13 hours ago

Here's example of what we're doing to export the labels in something beyond kube_node_labels

resources:
  - groupVersionKind:
      kind: Node
      version: v1
    labelsFromPath:
      node: [metadata, name]
    metricNamePrefix: ""
    metrics:
      - name: kube_node_ns1_info
        help: ns1 example labels
        each:
          type: Info
          info:
            path: [metadata]
            labelsFromPath:
              ns1_example_com_foo: [labels, "ns1.example.com/foo"]
              ns1_example_com_bar: [labels, "ns1.example.com/bar"]
              ns1_example_com_baz: [labels, "ns1.example.com/baz"]
      - name: kube_node_ns2_info
        help: ns2 example labels
        each:
          type: Info
          info:
            path: [metadata]
            labelsFromPath:
              ns2_example_com_a: [labels, "ns2.example.com/a"]
              ns2_example_com_b: [labels, "ns2.example.com/b"]
              ns2_example_com_c: [labels, "ns2.example.com/c"]