kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.2k stars 1.92k forks source link

Generated Prometheus metrics output not meet with the requirements #2366

Open kallaics opened 2 months ago

kallaics commented 2 months ago

What happened:

The KSM configuration worked well until KSM version v2.10.1. After the upgrade to v2.11.0 the Prometheus reported "invalid metric type" error message. The latest version v2.12.0 solved the "invalid metric type issue", but the required output has been provided only one resource type per metrics. The deployment and configuration not changed during this period.

The issue affected with the "build_info" metric name.

What you expected to happen:

To provide Prometheus output with same metric name and more resource type.

How to reproduce it (as minimally and precisely as possible):

  1. Kube state metrics deployed from prometheus-community/kube-prometheus-stack Helm chart via FluxCD
  2. Relevant Kube State Metrics configuration provided in Yaml format.
kube-state-metrics:
  collectors: [ ]
  extraArgs:
    - --custom-resource-state-only=true
  rbac:
    extraRules:
      - apiGroups:
          - apps
        resources:
          - deployments
        verbs: 
          - list
          - watch
      - apiGroups:
          - source.toolkit.fluxcd.io
          - kustomize.toolkit.fluxcd.io
          - helm.toolkit.fluxcd.io
          - notification.toolkit.fluxcd.io
          - image.toolkit.fluxcd.io
        resources:
          - gitrepositories
          - buckets
          - helmrepositories
          - helmcharts
          - ocirepositories
          - kustomizations
          - helmreleases
          - alerts
          - providers
          - receivers
          - imagerepositories
          - imagepolicies
          - imageupdateautomations
        verbs: [ "list", "watch" ]
  customResourceState:
    enabled: true
    config:
      spec:
        resources:
          - groupVersionKind:
              group: apps
              version: v1
              kind: Deployment
            metricNamePrefix: gotk
            metrics:
              - name: "build_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      version: [metadata, labels, "app.kubernetes.io/version" ]
                      component: [metadata, labels, "app.kubernetes.io/component" ]
                      instance: [metadata, labels, "app.kubernetes.io/instance" ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
          - groupVersionKind:
              group: kustomize.toolkit.fluxcd.io
              version: v1
              kind: Kustomization
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  source_name: [ spec, sourceRef, name ]
          - groupVersionKind:
              group: helm.toolkit.fluxcd.io
              version: v2beta2
              kind: HelmRelease
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  released: [ status, conditions, "[type=Released]", status ]
                  suspended: [ spec, suspend ]
                  chart_name: [ spec, chart, spec, chart ]
                  chart_source_name: [ spec, chart, spec, sourceRef, name ]
          - groupVersionKind:
              group: source.toolkit.fluxcd.io
              version: v1
              kind: GitRepository
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  url: [ spec, url ]
          - groupVersionKind:
              group: source.toolkit.fluxcd.io
              version: v1beta2
              kind: Bucket
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  endpoint: [ spec, endpoint ]
                  bucket_name: [ spec, bucketName ]
          - groupVersionKind:
              group: source.toolkit.fluxcd.io
              version: v1beta2
              kind: HelmRepository
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  url: [ spec, url ]
          - groupVersionKind:
              group: source.toolkit.fluxcd.io
              version: v1beta2
              kind: HelmChart
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  chart_name: [ spec, chart ]
                  chart_version: [ spec, version ]
          - groupVersionKind:
              group: source.toolkit.fluxcd.io
              version: v1beta2
              kind: OCIRepository
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  url: [ spec, url ]
          - groupVersionKind:
              group: notification.toolkit.fluxcd.io
              version: v1beta3
              kind: Alert
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
          - groupVersionKind:
              group: notification.toolkit.fluxcd.io
              version: v1beta3
              kind: Provider
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
          - groupVersionKind:
              group: notification.toolkit.fluxcd.io
              version: v1
              kind: Receiver
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  webhook_path: [ status, webhookPath ]
          - groupVersionKind:
              group: image.toolkit.fluxcd.io
              version: v1beta2
              kind: ImageRepository
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  image: [ spec, image ]
          - groupVersionKind:
              group: image.toolkit.fluxcd.io
              version: v1beta2
              kind: ImagePolicy
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  source_name: [ spec, imageRepositoryRef, name ]
          - groupVersionKind:
              group: image.toolkit.fluxcd.io
              version: v1beta1
              kind: ImageUpdateAutomation
            metricNamePrefix: gotk
            metrics:
              - name: "resource_info"
                help: "The current state of a GitOps Toolkit resource."
                each:
                  type: Info
                  info:
                    labelsFromPath:
                      name: [ metadata, name ]
                labelsFromPath:
                  exported_namespace: [ metadata, namespace ]
                  ready: [ status, conditions, "[type=Ready]", status ]
                  status: [ status, conditions, "[type=Ready]", reason ]
                  reconciling: [ status, conditions, "[type=Reconciling]", status ]
                  stalled: [ status, conditions, "[type=Stalled]", status ]
                  suspended: [ spec, suspend ]
                  source_name: [ spec, sourceRef, name ]

Anything else we need to know?:

Environment:

kingdonb commented 2 months ago

I've tested the flux2-monitoring-example and verified we were using kube-state-metrics v2.12.0, it does not seem to resolve the issue completely, though some metrics came back, in https://github.com/fluxcd/flux2-monitoring-example/issues/32 you can see we only returned "HelmRelease" metrics and the other resource kinds' metrics did not come back.

speer commented 2 months ago

I did some tests and found, that it's related to the code change of the SanitizeHeaders function in: #2270 https://github.com/kubernetes/kube-state-metrics/pull/2270/files#diff-60450a33adea08c953656dd1e78a80e9f3b279bbc7656dedf31fd1a0c7fc1196

The issue seems to be in the help: "The current state of a GitOps Toolkit resource." message. If you make this one unique (ex. different one for HelmRelease, Kustomization, etc.), the metrics do not get removed by the function mentioned above.

I am just not sure if that's a bug or a feature, maybe the author @rexagod knows?

logicalhan commented 2 months ago

/assign @CatherineF-dev /triage accepted

kallaics commented 2 months ago

I did some tests and found, that it's related to the code change of the SanitizeHeaders function in: #2270 https://github.com/kubernetes/kube-state-metrics/pull/2270/files#diff-60450a33adea08c953656dd1e78a80e9f3b279bbc7656dedf31fd1a0c7fc1196

The issue seems to be in the help: "The current state of a GitOps Toolkit resource." message. If you make this one unique (ex. different one for HelmRelease, Kustomization, etc.), the metrics do not get removed by the function mentioned above.

I am just not sure if that's a bug or a feature, maybe the author @rexagod knows?

I can confirm. After I changed the "help" fields, the metrics are appeared in Prometheus and Grafana. Thanks @speer !

rexagod commented 1 month ago

Hello, apologies for the late response. šŸ‘‹šŸ¼

Prometheus' protobuf machinery does not support all OpenMetrics types at the moment (https://github.com/kubernetes/kube-state-metrics/issues/2248). To resolve this, #2270 was merged which implicitly converted stateset and info to gauge metrics, before piping them out (PTAL at these test-cases). This, in turn, gave rise to cases where metrics that were previously seemingly non-conflicting, would potentially start to conflict now, which is why the patch had to include a deduplicating capability, causing the issue raised here as a side-effect.

https://github.com/fluxcd/flux2-monitoring-example/issues/32#issuecomment-2059346695 presents a take on this that has been the implicit sentiment on such configuration scenarios, i.e., if the use-case warrants for different groupVersionKind definitions, it should ideally be acquainted by different help texts to indicate what changed between them.

I'd be happy to follow this up by pointing out the caveat observed here in the documentation for future instances.