thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
12.73k stars 2.04k forks source link

Thanos compactor causing huge memory spikes when compacting raw blocks #7350

Open mohaabduvisa opened 1 month ago

mohaabduvisa commented 1 month ago

What happened: We are planning to use thanos for long term storage and during the process we are facing few setbacks. As attached we are seeing 15GB RAM spike for thanos-compactor for 3.5lakh time series. We have a plan to implement compaction and downsampling for 8M time series which extrapolating would result in below figures

15GB RAM - 3.5 lakh samples 360GB RAM - 8M samples

360GB RAM is too much actually for short spikes. Below is the configuration we are using, despite setting concurrency arguments we are still seeing memory spikes. Please let us will this be fixed in future versions?

What you expected to happen: We expected RAM utilization to be way lesser

How to reproduce it (as minimally and precisely as possible): We are running two replicas of prometheus with thanos sidecar embedded to write to minio s3 object storage in the same cluster with the below configuration

  containers:
  - args:
    - compact
    - --wait
    - --log.level=info
    - --log.format=logfmt
    - --objstore.config=$(OBJSTORE_CONFIG)
    - --data-dir=/var/thanos/compact
    - --retention.resolution-raw=2d
    - --retention.resolution-5m=15d
    - --retention.resolution-1h=30d
    - --deduplication.func=penalty
    - --compact.enable-vertical-compaction
    - --delete-delay=2h
    - --block-files-concurrency=20
    - --block-meta-fetch-concurrency=32
    - --compact.blocks-fetch-concurrency=20
    - --consistency-delay=30m
    - --deduplication.replica-label=prometheus_replica

Thanos, Prometheus and Golang version used: Thanos: 0.34.1 prometheus: 2.49.2 golang: 1.21

compactor-details compactor-spike
jkroepke commented 1 month ago

@mohaabduvisa Whats happens, if you set the memory.limit on a compactor + using --enable-auto-gomemlimit ? (feature from 0.35)

I guess, without an limit, to go runtime uses what ever it gets.

douglascamata commented 1 month ago

Reducing those concurrency parameters could help.