thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.11k stars 2.1k forks source link

Descrepancy in labels using api/v1/series and api/v1/query when external label and internal label has the same key #6844

Closed andrejshapal closed 1 year ago

andrejshapal commented 1 year ago

Hello,

I am using Thanos 0.32.5.

Issue: It was noticed flaky issue, that Thanos always exposing external_label when executing queries. But it randomly giving external_label or internal_label value when labels key is the same when querieng api/v1/series.

Here is prometheus output: image

And after application external_label, the new label cluster shows in thanos: image

But when I am querieng api/v1/series endpoint, it randomly gives value of cluster: { "status": "success", "data": [ { "__name__": "collectd_collectd_queue_length", "cassandra_datastax_com_cluster": "cassandra", "cassandra_datastax_com_datacenter": "dc1", "cluster": "cassandra", "collectd": "write_queue", "container": "cassandra", "dc": "dc1", "endpoint": "prometheus", "exported_instance": "10.2.150.192", "instance": "10.2.150.192:9103", "job": "cassandra-dc1-all-pods-service", "namespace": "cit1-core", "pod": "cassandra-dc1-r2-sts-0", "prometheus": "monitoring/kube-prometheus-stack-prometheus", "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0", "rack": "r2", "service": "cassandra-dc1-all-pods-service" }, { "__name__": "collectd_collectd_queue_length", "cassandra_datastax_com_cluster": "cassandra", "cassandra_datastax_com_datacenter": "dc1", "cluster": "cassandra", "collectd": "write_queue", "container": "cassandra", "dc": "dc1", "endpoint": "prometheus", "exported_instance": "10.2.151.7", "instance": "10.2.151.7:9103", "job": "cassandra-dc1-all-pods-service", "namespace": "cit1-core", "pod": "cassandra-dc1-r3-sts-0", "prometheus": "monitoring/kube-prometheus-stack-prometheus", "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0", "rack": "r3", "service": "cassandra-dc1-all-pods-service" } ] }

Expected: Any api call should prioritise external_label and return it as a result of request.

Possible solution: The current solution is to rename internal label in scrape config. But mostly we are using configs via helm out of the box. Meaning, we do not set configs. Therefore, there is a chance external label match some random label from some random metric. Since that, this is worth to fix discrepancy. Series api endpoint is used by Grafana.

GiedriusS commented 1 year ago

Are all components on the same version?

andrejshapal commented 1 year ago

@GiedriusS sidecars have 0.32.4, the rest 0.32.5.

mhoffm-aiven commented 1 year ago

Hey, are you able to share some downstream blocks so we can try to reproduce locally? Also what downstream stores apis are you querying?

andrejshapal commented 1 year ago

@mhoffm-aiven

what downstream stores apis are you querying

Not sure what are downstream stores apis. But we have global thanos, which is querieng querier on another cluster via grpc, which works with thanos sidecar. Data is stored in gcp bucket.

are you able to share some downstream blocks so we can try to reproduce locally

Can you help me with some guide? Should I just send you some chunks from bucket? If so, how can I find necessary chunks (we have too many of them and no "created at" in gcp bucket".

MichaHoffmann commented 1 year ago

Mh, ok so sharing might not be practical. With "downsteam store api" i meant essentially "--endpoint"s!

MichaHoffmann commented 1 year ago

Can you bump sidecars to 0.32.5? There was this

https://github.com/thanos-io/thanos/pull/6816 Store: fix prometheus store label values matches for external labels

Which feels somewhat related.

andrejshapal commented 1 year ago

@MichaHoffmann I suspect it is enough just to bump sidecar on the cluster where the issue is reproducible? If so, I bumped to 0.32.5:

{
    "status": "success",
    "data": [
        {
            "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
            "cassandra_datastax_com_cluster": "-cassandra",
            "cassandra_datastax_com_datacenter": "dc1",
            "cluster": "cit1-k8s",
            "container": "cassandra",
            "datacenter": "dc1",
            "endpoint": "metrics",
            "exported_instance": "10.2.145.73",
            "host": "3a2b71b6-8026-4323-a5d9-6b9420258bc5",
            "instance": "10.2.145.73:9000",
            "job": "-cassandra-dc1-all-pods-service",
            "namespace": "cit1--core",
            "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-6w90",
            "pod": "-cassandra-dc1-r3-sts-0",
            "pod_name": "-cassandra-dc1-r3-sts-0",
            "pool_name": "PerDiskMemtableFlushWriter_0",
            "pool_type": "internal",
            "prometheus": "monitoring/kube-prometheus-stack-prometheus",
            "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
            "rack": "r3",
            "service": "-cassandra-dc1-all-pods-service"
        },
        {
            "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
            "cassandra_datastax_com_cluster": "-cassandra",
            "cassandra_datastax_com_datacenter": "dc1",
            "cluster": "cit1-k8s",
            "container": "cassandra",
            "datacenter": "dc1",
            "endpoint": "metrics",
            "exported_instance": "10.2.147.193",
            "host": "1252ec4c-66b7-47de-9745-42d368198c3e",
            "instance": "10.2.147.193:9000",
            "job": "-cassandra-dc1-all-pods-service",
            "namespace": "cit1--core",
            "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-xmr5",
            "pod": "-cassandra-dc1-r2-sts-0",
            "pod_name": "-cassandra-dc1-r2-sts-0",
            "pool_name": "PerDiskMemtableFlushWriter_0",
            "pool_type": "internal",
            "prometheus": "monitoring/kube-prometheus-stack-prometheus",
            "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
            "rack": "r2",
            "service": "-cassandra-dc1-all-pods-service"
        },
        {
            "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
            "cassandra_datastax_com_cluster": "-cassandra",
            "cassandra_datastax_com_datacenter": "dc1",
            "cluster": "-cassandra",
            "container": "cassandra",
            "datacenter": "dc1",
            "endpoint": "metrics",
            "exported_instance": "10.2.150.131",
            "host": "1cae4b22-a89b-451f-8f02-d276b86efb83",
            "instance": "10.2.150.131:9000",
            "job": "-cassandra-dc1-all-pods-service",
            "namespace": "cit1--core",
            "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-movi",
            "pod": "-cassandra-dc1-r1-sts-0",
            "pod_name": "-cassandra-dc1-r1-sts-0",
            "pool_name": "PerDiskMemtableFlushWriter_0",
            "pool_type": "internal",
            "prometheus": "monitoring/kube-prometheus-stack-prometheus",
            "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
            "rack": "r1",
            "service": "-cassandra-dc1-all-pods-service"
        }
    ]
}

Issue not gone.

image

what downstream stores apis are you querying With "downsteam store api" i meant essentially "--endpoint"s!

Well, we have many endpoints.

        - query
        - '--log.level=info'
        - '--log.format=logfmt'
        - '--grpc-address=0.0.0.0:10901'
        - '--http-address=0.0.0.0:10902'
        - '--query.replica-label=replica'
        - '--endpoint=thanos-sidecar-querier-query-grpc.monitoring.svc:10901'
        - '--endpoint=thanos-storegateway.monitoring.svc:10901'
        - '--endpoint=lv01-prometheus01.int.company.live:10903'
        - '--endpoint=lv01-prometheus02.int.company.live:10903'
        - '--endpoint=ro01-prometheus01.int.company.live:10903'
        - '--endpoint=ro01-prometheus02.int.company.live:10903'
        - '--endpoint=ge01-prometheus01.int.company.live:10903'
        - '--endpoint=ge01-prometheus02.int.company.live:10903'
        - '--endpoint=thanos.ci.int.company.live:443'
        - '--endpoint=thanos.ci-en1.int.company.live:443'
        - '--endpoint=thanos.dev.int.company.live:443'
        - '--endpoint=thanos.live.int.company.live:443'
        - '--endpoint=thanos-1.global.int.company.live:443'
        - >-
          --endpoint=astradb-thanos-sidecar-querier-query-grpc.monitoring.svc:10901
        - '--grpc-client-tls-secure'
        - '--grpc-client-tls-cert=/certs/client/tls.crt'
        - '--grpc-client-tls-key=/certs/client/tls.key'
        - '--grpc-client-tls-ca=/certs/client/ca.crt'

The one, which have metrics in questions is thanos.dev.int.company.live:443

yeya24 commented 1 year ago

@andrejshapal Can you try bumping up the version? Seems it is the same bug fixed in v0.32.5

andrejshapal commented 1 year ago

@yeya24 Hello, Bumped everything to 0.32.5 and still see the same issue.

MichaHoffmann commented 1 year ago

Hey @andrejshapal can you share configuration of the offending thanos.dev.int.company.live please?

andrejshapal commented 1 year ago

@MichaHoffmann Sure:

spec:
  project: application-support
  sources:
    - repoURL: https://helm.onairent.live
      chart: any-resource
      targetRevision: "0.1.0"
      helm:
        values: |
          anyResources:
    - repoURL: https://charts.bitnami.com/bitnami
      chart: thanos
      targetRevision: "12.13.12"
      helm:
        values: |
          fullnameOverride: thanos-sidecar-querier
          query:
            dnsDiscovery:
              enabled: true
              sidecarsService: kube-prometheus-stack-thanos-discovery
              sidecarsNamespace: monitoring

            service:
              annotations:
                traefik.ingress.kubernetes.io/service.serversscheme: h2c

            serviceGrpc:
              annotations:
                traefik.ingress.kubernetes.io/service.serversscheme: h2c

            ingress:
              grpc:
                enabled: true
                ingressClassName: traefik-internal
                annotations:
                  traefik.ingress.kubernetes.io/router.tls.options: monitoring-thanos@kubernetescrd
                hostname: thanos.dev.int.company.live
                extraTls:
                  - hosts:
                      - thanos.dev.int.company.live
                    secretName: thanos-client-server-cert-1

          bucketweb:
            enabled: false

          compactor:
            enabled: false

          storegateway:
            enabled: false

          receive:
            enabled: false

          metrics:
            enabled: true
            serviceMonitor:
              enabled: true
              labels:
                prometheus: main

I also noticed it returns one cluster untill 07:00 27/10/2023 (local time, now is 12:41) and at 07:05 already 2 "clusters".

MichaHoffmann commented 1 year ago

can you share the prometheus configurations from the instances that monitor the offending cassandra cluster too please?

andrejshapal commented 1 year ago

We use kube-prometheus-stack. Nothing really special:

    - repoURL: https://prometheus-community.github.io/helm-charts
      chart: kube-prometheus-stack
      targetRevision: "50.3.1"
      helm:
        values: |
          fullnameOverride: kube-prometheus-stack
          commonLabels:
            prometheus: main

          defaultRules:
            create: false

          kube-state-metrics:
            fullnameOverride: kube-state-metrics
            prometheus:
              monitor:
                enabled: true
                additionalLabels:
                  prometheus: main
                metricRelabelings:
                  - action: labeldrop
                    regex: container_id
                  - action: labeldrop
                    regex: uid
                  - sourceLabels: [__name__]
                    action: drop
                    regex: 'kube_configmap_(annotations|created|info|labels|metadata_resource_version)'
            collectors:
              - certificatesigningrequests
              - configmaps
              - cronjobs
              - daemonsets
              - deployments
              - endpoints
              - horizontalpodautoscalers
              - ingresses
              - jobs
              - limitranges
              - mutatingwebhookconfigurations
              - namespaces
              - networkpolicies
              - nodes
              - persistentvolumeclaims
              - persistentvolumes
              - poddisruptionbudgets
              - pods
              - replicasets
              - replicationcontrollers
              - resourcequotas
              - secrets
              - services
              - statefulsets
              - storageclasses
              - validatingwebhookconfigurations
              - volumeattachments
            metricLabelsAllowlist:
              - pods=[version]

          kubeScheduler:
            enabled: false

          kubeEtcd:
            enabled: false

          kubeProxy:
            enabled: false

          kubeControllerManager:
            enabled: false

          prometheus-node-exporter:
            fullnameOverride: node-exporter
            extraArgs:
              - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
              - --collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|tmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
            prometheus:
              monitor:
                enabled: true
                additionalLabels:
                  prometheus: main
                relabelings:
                  - action: replace
                    sourceLabels:
                    - __meta_kubernetes_pod_node_name
                    targetLabel: instance

          coreDns:
            enabled: false

          kubelet:
            enabled: true
            serviceMonitor:
              cAdvisorMetricRelabelings:
                - sourceLabels: [__name__]
                  action: drop
                  regex: 'container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)'
                - sourceLabels: [__name__]
                  action: drop
                  regex: 'container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)'
                - sourceLabels: [__name__]
                  action: drop
                  regex: 'container_memory_(mapped_file|swap)'
                - sourceLabels: [__name__]
                  action: drop
                  regex: 'container_(file_descriptors|tasks_state|threads_max)'
                - sourceLabels: [__name__]
                  action: drop
                  regex: 'container_spec.*'
                - sourceLabels: [id, pod]
                  action: drop
                  regex: '.+;'
                - action: labeldrop
                  regex: id
                - action: labeldrop
                  regex: name
                - action: labeldrop
                  regex: uid

              cAdvisorRelabelings:
                - action: replace
                  sourceLabels: [__metrics_path__]
                  targetLabel: metrics_path

              probesMetricRelabelings:
                - action: labeldrop
                  regex: pod_uid

              probesRelabelings:
                - action: replace
                  sourceLabels: [__metrics_path__]
                  targetLabel: metrics_path

              resourceRelabelings:
                - action: replace
                  sourceLabels: [__metrics_path__]
                  targetLabel: metrics_path

              relabelings:
                - action: replace
                  sourceLabels: [__metrics_path__]
                  targetLabel: metrics_path

          grafana:
            enabled: false

          alertmanager:
            enabled: false

          prometheus:
            enabled: true
            monitor:
              additionalLabels:
                prometheus: main

            serviceAccount:
              create: true
              name: "prometheus"

            thanosService:
              enabled: true

            thanosServiceMonitor:
              enabled: true

            ingress:
              enabled: true
              annotations:
                kubernetes.io/ingress.class: traefik-internal
              hosts:
                - prometheus.dev.int.company.live
              tls:
              - hosts:
                  - prometheus.dev.int.company.live
                secretName: wildcard-dev-int-company-live

            prometheusSpec:
              enableRemoteWriteReceiver: true
              serviceAccountName: prometheus
              enableAdminAPI: true
              disableCompaction: true
              scrapeInterval: 10s
              retention: 2h
              additionalScrapeConfigsSecret:
                enabled: false
              storageSpec:
                volumeClaimTemplate:
                  spec:
                    accessModes: ["ReadWriteOnce"]
                    resources:
                      requests:
                        storage: 20Gi

              externalLabels:
                cluster: cit1-k8s
                replica: prometheus-cit1-1

              additionalAlertManagerConfigs:
                - scheme: https
                  static_configs:
                    - targets:
                        - alertmanager.company.live

              thanos:
                image: quay.io/thanos/thanos:v0.32.5
                objectStorageConfig:
                  name: thanos-objstore
                  key: objstore.yml

              ruleSelector:
                matchLabels:
                  evaluation: prometheus
              serviceMonitorSelector:
                matchLabels:
                  prometheus: main
              podMonitorSelector:
                matchLabels:
                  prometheus: main
              probeSelector:
                matchLabels:
                  prometheus: main
              resources:
                requests:
                  cpu: "3.2"
                  memory: 14Gi
                limits:
                  cpu: 8
                  memory: 20Gi
MichaHoffmann commented 1 year ago

Is there another replica somewhere maybe? Asking since it has the external "replica" label

andrejshapal commented 1 year ago

@MichaHoffmann Nope. We have HA prometheuses on some clusters, but added replica label everywhere just for consistency.

MichaHoffmann commented 1 year ago

having replica label on things that are not replicas of one another feels like ti could be an issue

andrejshapal commented 1 year ago

@MichaHoffmann I can try to remove replica label. But this should not be an issue, sisnce it just used as a deduplication label?

andrejshapal commented 1 year ago

@MichaHoffmann I have removed replica label, but no effect on issue in question.

MichaHoffmann commented 1 year ago

@MichaHoffmann I have removed replica label, but no effect on issue in question.

Ah well, an attempt was made. Do you have the same issue if you uncheck "Use Deduplication" ?

andrejshapal commented 1 year ago

@MichaHoffmann In thanos query with or without deduplication issue is not noticed. I don't think querieng works via api/v1/series.

MichaHoffmann commented 1 year ago

You can specify ?dedup=false i think on the API request ( https://thanos.io/v0.33/components/query.md/#deduplication-enabled )

andrejshapal commented 1 year ago

dedup false:

{
  "status": "success",
  "data": [
    {
      "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
      "cassandra_datastax_com_cluster": "-cassandra",
      "cassandra_datastax_com_datacenter": "dc1",
      "cluster": "cit1-k8s",
      "container": "cassandra",
      "datacenter": "dc1",
      "endpoint": "metrics",
      "exported_instance": "10.2.145.73",
      "host": "3a2b71b6-8026-4323-a5d9-6b9420258bc5",
      "instance": "10.2.145.73:9000",
      "job": "-cassandra-dc1-all-pods-service",
      "namespace": "cit1--core",
      "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-6w90",
      "pod": "-cassandra-dc1-r3-sts-0",
      "pod_name": "-cassandra-dc1-r3-sts-0",
      "pool_name": "InternalResponseStage",
      "pool_type": "internal",
      "prometheus": "monitoring/kube-prometheus-stack-prometheus",
      "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
      "rack": "r3",
      "service": "-cassandra-dc1-all-pods-service"
    },
    {
      "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
      "cassandra_datastax_com_cluster": "-cassandra",
      "cassandra_datastax_com_datacenter": "dc1",
      "cluster": "-cassandra",
      "container": "cassandra",
      "datacenter": "dc1",
      "endpoint": "metrics",
      "exported_instance": "10.2.147.193",
      "host": "1252ec4c-66b7-47de-9745-42d368198c3e",
      "instance": "10.2.147.193:9000",
      "job": "-cassandra-dc1-all-pods-service",
      "namespace": "cit1--core",
      "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-xmr5",
      "pod": "-cassandra-dc1-r2-sts-0",
      "pod_name": "-cassandra-dc1-r2-sts-0",
      "pool_name": "InternalResponseStage",
      "pool_type": "internal",
      "prometheus": "monitoring/kube-prometheus-stack-prometheus",
      "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
      "rack": "r2",
      "service": "-cassandra-dc1-all-pods-service"
    }
  ]
}

dedup true:

{
  "status": "success",
  "data": [
    {
      "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
      "cassandra_datastax_com_cluster": "-cassandra",
      "cassandra_datastax_com_datacenter": "dc1",
      "cluster": "-cassandra",
      "container": "cassandra",
      "datacenter": "dc1",
      "endpoint": "metrics",
      "exported_instance": "10.2.145.73",
      "host": "3a2b71b6-8026-4323-a5d9-6b9420258bc5",
      "instance": "10.2.145.73:9000",
      "job": "-cassandra-dc1-all-pods-service",
      "namespace": "cit1--core",
      "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-6w90",
      "pod": "-cassandra-dc1-r3-sts-0",
      "pod_name": "-cassandra-dc1-r3-sts-0",
      "pool_name": "InternalResponseStage",
      "pool_type": "internal",
      "prometheus": "monitoring/kube-prometheus-stack-prometheus",
      "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
      "rack": "r3",
      "service": "-cassandra-dc1-all-pods-service"
    },
    {
      "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
      "cassandra_datastax_com_cluster": "-cassandra",
      "cassandra_datastax_com_datacenter": "dc1",
      "cluster": "-cassandra",
      "container": "cassandra",
      "datacenter": "dc1",
      "endpoint": "metrics",
      "exported_instance": "10.2.147.193",
      "host": "1252ec4c-66b7-47de-9745-42d368198c3e",
      "instance": "10.2.147.193:9000",
      "job": "-cassandra-dc1-all-pods-service",
      "namespace": "cit1--core",
      "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-xmr5",
      "pod": "-cassandra-dc1-r2-sts-0",
      "pod_name": "-cassandra-dc1-r2-sts-0",
      "pool_name": "InternalResponseStage",
      "pool_type": "internal",
      "prometheus": "monitoring/kube-prometheus-stack-prometheus",
      "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
      "rack": "r2",
      "service": "-cassandra-dc1-all-pods-service"
    }
  ]
}
MichaHoffmann commented 1 year ago

Would it be possible to send promtool tsdb dump output with appropriate matcher from the offending prometheus? ( With the labels censored like in this example ); I could build a block and try to debug locally from that!

andrejshapal commented 1 year ago

@MichaHoffmann Sorry for long waiting, had busy week. dump.zip

MichaHoffmann commented 1 year ago

Hey,

I did small local setup of prometheus, sidecar ,querier (on latest main) and your data and can reproduce!

$ curl -sq -g '0.0.0.0:10904/api/v1/series?' --data-urlencode 'match[]=foo'  | jq '.data.[].cluster'
"xxx-cassandra"
 fedora  ~  git  thanos-repro  andrej   
$ curl -sq -g '0.0.0.0:10904/api/v1/series?' --data-urlencode 'match[]=foo'  | jq '.data.[].cluster'
"xxx-cassandra"
 fedora  ~  git  thanos-repro  andrej   
$ curl -sq -g '0.0.0.0:10904/api/v1/series?' --data-urlencode 'match[]=foo'  | jq '.data.[].cluster'
"cluster_1"
 fedora  ~  git  thanos-repro  andrej   

with prometheus configured like

global:
  external_labels:
    cluster: cluster_1

querier and sidecar are configured mostly as default.

Thanks, ill look into this in the debugger a bit later!

MichaHoffmann commented 1 year ago

Ok i think i have found the issue and have a fix; was able to reproduce in a minimal acceptance test case.