thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
12.99k stars 2.08k forks source link

Thanos receiver keeps growing #5601

Open Ramshield opened 2 years ago

Ramshield commented 2 years ago

Thanos, Prometheus and Golang version used: Thanos version 0.27.0 Prometheus version 2.35.0

Object Storage Provider: minio s3

What happened: Currently we run thanos receiver, query compactor and storagegateway. We have 6 prometheus with a remote rule writing to thanos. But our thanos receiver keeps growing. In a week the persistent volume has grown by 80Gb. We see the same growth in our s3 minio backend.

What you expected to happen: Thanos receiver not to be this big.

How to reproduce it (as minimally and precisely as possible): Not sure, but we installed Thanos via helm, version 10.5.5:

thanos: 
  existingObjstoreSecret: thanos-objstore-secret

  global:
    imageRegistry: ourharbor

  image:
    tag: v0.27.0
    repository: monitoring/thanos 

  query:
    enabled: true
    ingress:
      enabled: false
      hostname: querier.example.com
      ingressClassName: nginx 

  queryFrontend:
    enabled: true
    ingress:
      enabled: true
      hostname: thanos-query-frontend.example.com
      ingressClassName: nginx 

  bucketweb:
    enabled: false
    ingress:
      enabled: false
      hostname: thanos-bucketweb.example.com
      ingressClassName: nginx 

  storegateway:
    enabled: true
    ingress:
      enabled: false
      hostname: thanos-store.example.com
      ingressClassName: nginx 

  receive:
    enabled: true
    ingress:
      enabled: true
      hostname: thanos-receive.example.com
      ingressClassName: nginx

    persistence:
      enabled: true
      size: 150Gi

  compactor:
    enabled: true
    ingress:
      enabled: true
      hostname: thanos-compactor.example.com
      ingressClassName: nginx
    persistence:
      enabled: true
      size: 100Gi

Full logs to relevant components: no errors are logged in thanos receiver.

Anything else we need to know: We run this on a Kubernetes Rancher environment.

matej-g commented 2 years ago

Hi, is this a new setup or has the receiver started to grow unexpectedly on an existing setup?

I'm not very familiar with the Helm chart, but could you maybe provide all the parameters you are running your receiver with?

Ramshield commented 2 years ago

Hi, is this a new setup or has the receiver started to grow unexpectedly on an existing setup?

I'm not very familiar with the Helm chart, but could you maybe provide all the parameters you are running your receiver with?

This is a new setup.

All parameters I deployed Helm with are already given, so nothing special at all.

Ramshield commented 2 years ago

Any idea @matej-g ? Thanks!

stale[bot] commented 1 year ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

dfcastro1 commented 1 year ago

@matej-g i´ve got the same issue regardind receiver, it does uploads blocks to Object Storage but does´t delete it from local disk, here is the params:

receive:
    Container ID:  containerd://f3df1a026dea56c4621fd59814db3c1bef3ea6339988bf6844af6dd8993807e8
    Image:         docker.io/bitnami/thanos:0.29.0-scratch-r0
    Image ID:      docke.io/bitnami/thanos@sha256:e239696f575f201cd7f801e80945964fda3f731cd62be70772f88480bb428fcd
    Por   10901/TCP, 10902/TCP, 19291/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
 Args:
      --log.level=info
      --log.format=logfmt
      --grpc-address=0.0.0.0:10901
      --http-address=0.0.0.0:10902
      --remote-write.address=0.0.0.0:19291
      --objstore.config=$(OBJSTORE_CONFIG)
      --tsdb.path=/var/thanos/receive
      --label=prometheus_replica="$(NAME)"
      --label=receive="true"
      --tsdb.retention=15d
      --receive.local-endpoint=$(NAME).thanos-receive-headless.$(NAMESPACE).svc.cluster.local:10901

As you can see, we have blocks being created every 2 hours:

drwxr-sr-x 3 1001 1001 4.0K Dec  7 16:42 01GKPQFF5AW3A7CM7GH2Z35H07
drwxr-sr-x 3 1001 1001 4.0K Dec  7 19:42 01GKQ1RWN1DW55EZG53X7HKE9N
drwxr-sr-x 3 1001 1001 4.0K Dec  7 21:00 01GKQ66R0WJVBYFNV3Z0640XQ4
drwxr-sr-x 3 1001 1001 4.0K Dec  7 23:00 01GKQD2FAHNGX862B7P3MDS8MF
drwxr-sr-x 3 1001 1001 4.0K Dec  8 01:00 01GKQKY6GAAW9CRDXB7J312CE5
drwxr-sr-x 3 1001 1001 4.0K Dec  8 03:00 01GKQTSXRGB1X9T98YBMYEVQS5
drwxr-sr-x 3 1001 1001 4.0K Dec  8 05:00 01GKR1NN04XFAMR4SZVTGQ3KVJ
drwxr-sr-x 3 1001 1001 4.0K Dec  8 07:00 01GKR8HC83PN5HV7Z9MZ5FGW0H
drwxr-sr-x 3 1001 1001 4.0K Dec  8 09:00 01GKRFD3G4BHADPY5FVNH5XE2J
drwxr-sr-x 3 1001 1001 4.0K Dec  8 11:00 01GKRP8TRAR34XPEM007NCAQV9
drwxr-sr-x 3 1001 1001 4.0K Dec  8 13:00 01GKRX4J15JDJGHWPSYDW7NWJ6
drwxr-sr-x 3 1001 1001 4.0K Dec  8 15:00 01GKS4098HPF81TXGTPDQ056J5
drwxr-sr-x 3 1001 1001 4.0K Dec  8 17:00 01GKSAW0GYN903AAP03GF6V1S2
drwxr-sr-x 3 1001 1001 4.0K Dec  8 19:00 01GKSHQQRJES8F45K9BDC69XEZ
drwxr-sr-x 3 1001 1001 4.0K Dec  8 21:00 01GKSRKF00034QP9BDH1HATWDR
drwxr-sr-x 3 1001 1001 4.0K Dec  8 23:00 01GKSZF6BES1GD60DE6AS8KWN2
drwxr-sr-x 3 1001 1001 4.0K Dec  9 01:00 01GKT6AXH9TXVEK0Z60JGMR2XX
drwxr-sr-x 3 1001 1001 4.0K Dec  9 03:00 01GKTD6MSNXGR4F3PK35WVS0YT
drwxr-sr-x 3 1001 1001 4.0K Dec  9 05:00 01GKTM2C05ST0XWT050VMHZR6B
drwxr-sr-x 3 1001 1001 4.0K Dec  9 07:00 01GKTTY38P6NJYZ27P70FVXASC
drwxr-sr-x 3 1001 1001 4.0K Dec  9 09:00 01GKV1STGRCT4TP856N9EP8CF6
drwxr-sr-x 3 1001 1001 4.0K Dec  9 11:00 01GKV8NHTZP49AGDR9W87R053G
drwxr-sr-x 2 1001 1001 4.0K Dec  9 11:00 chunks_head
-rw-r--r-- 1 1001 1001    0 Dec  7 16:31 lock
drwxrws--- 2 root 1001  16K Dec  7 16:31 lost+found
drwxr-s--- 3 1001 1001 4.0K Dec  7 16:42 thanos
-rw-r--r-- 1 1001 1001  740 Dec  9 12:44 thanos.shipper.json
drwxr-sr-x 3 1001 1001 4.0K Dec  9 11:00 wal