thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.03k stars 2.09k forks source link

v0.32.1 Error executing query - postings is nil for ... It was never fetched. #6678

Closed rgarcia89 closed 1 year ago

rgarcia89 commented 1 year ago

Thanos, Prometheus and Golang version used: v0.32.1

Object Storage Provider: Azure

What happened: Since updating to v0.32.1 from v0.31.0 we cannot access metrics from the S3 storage. They end up with the error shown in the full log.

Full logs to relevant components:

Logs

Error executing query: expanding series: proxy Series(): rpc error: code = Aborted desc = receive series from Addr: 10.244.9.84:10901 LabelSets: {cluster="ap", prometheus="monitoring/ap", prometheus_replica="prometheus-ap-0", stage="lab"},{cluster="ap", prometheus="monitoring/ap", stage="lab"},{cluster="norc", prometheus="monitoring/norc", prometheus_replica="prometheus-norc-0", stage="lab"},{cluster="norc", prometheus="monitoring/norc", stage="lab"},{cluster="oss-bss", prometheus="monitoring/oss-bss", prometheus_replica="prometheus-oss-bss-0", stage="lab"},{cluster="oss-bss", prometheus="monitoring/oss-bss", stage="lab"},{cluster="self", prometheus="monitoring/self", prometheus_replica="prometheus-self-0", stage="lab"},{cluster="self", prometheus="monitoring/self", stage="lab"},{cluster="sisr", prometheus="monitoring/sisr", prometheus_replica="prometheus-sisr-0", stage="lab"},{cluster="sisr", prometheus="monitoring/sisr", stage="lab"} MinTime: 1684972800027 MaxTime: 1693324800000: rpc error: code = Aborted desc = fetch series for block 01H8Y3NK7RBHVJC1WFX79EGEPE: expanded matching posting: expand: postings is nil for {__name__=container_memory_working_set_bytes}. It was never fetched.

Anything else we need to know: I have downgraded back to v0.31.0 - there everything works fine

yeya24 commented 1 year ago

Do you use any index cache or it is the default configuration

rgarcia89 commented 1 year ago

It is the default configuration that comes with kube-thanos.

With the following common config:

local commonConfig = {
  config+:: {
    local cfg = self,
    namespace: 'thanos',
    version: 'v0.31.0',
    image: 'quay.io/thanos/thanos:' + cfg.version,
    imagePullPolicy: 'IfNotPresent',
    retentionResolutionRaw: '60d',
    retentionResolution5m: '90d',
    retentionResolution1h: '180d',
    objectStorageConfig: {
      name: 'thanos-objectstorage',
      key: 'thanos.yaml',
    },
    // hashringConfigMapName: 'hashring-config',
    volumeClaimTemplate: {
      spec: {
        accessModes: ['ReadWriteOnce'],
        resources: {
          requests: {
            storage: '10Gi',
          },
        },
      },
    },
  },
};
MichaHoffmann commented 1 year ago

Can we consolidate discussion here: https://github.com/thanos-io/thanos/issues/6660 ?

MichaHoffmann commented 1 year ago

This should be fixed in 0.32.2

rgarcia89 commented 1 year ago

fixed in 0.32.2