thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
12.73k stars 2.04k forks source link

receive: memory spikes during tenant WAL truncation #7255

Open Nashluffy opened 2 months ago

Nashluffy commented 2 months ago

Thanos, Prometheus and Golang version used:

thanos, version 0.34.1 (branch: HEAD, revision: 4cf1559998bf6d8db3f9ca0fde2a00d217d4e23e)
  build user:       root@61db75277a55
  build date:       20240219-17:13:48
  go version:       go1.21.7
  platform:         linux/amd64
  tags:             netgo

Object Storage Provider: GCS

What happened: We have several prometheus instances remote writing to a set of 30 receivers. The receivers normally hover around 8GiB of memory, but once every 2 hours the memory spikes up across all receivers at the same time by roughly 20-25%.

Screenshot 2024-04-02 at 19 48 48

And the corresponding WAL truncations across all receivers.

image

There are other memory spikes that I'm not certain the root cause, like at 6:30 and 9:07. But looking at receiver memory usage over the past 2 weeks, there are consistent spikes when tenant WAL truncations happen.

What you expected to happen: No memory spikes during WAL truncation, or the ability to stagger when truncation happens.

How to reproduce it (as minimally and precisely as possible): Unsure, I'm running a fairly standard remote-write + receiver setup. I've raised this in the CNCF Slack and at least one other person has observed the memory spikes as well.

Full logs to relevant components:

Anything else we need to know:

fpetkovski commented 2 months ago

This seems to coincide with intervals when head compaction happens. I think this process acquires a write lock and pending samples pile up in memory. @yeya24 do you see something similar in Cortex?

Nashluffy commented 2 months ago

This seems to coincide with intervals when head compaction happens. I think this process acquires a write lock and pending samples pile up in memory. @yeya24 do you see something similar in Cortex?

Just confirming the compactions happen at the same time as the memory spikes

image
jnyi commented 2 months ago

did you get context deadline exceeded (500) from ingestors during the WAL compaction?

GiedriusS commented 2 months ago

Yeah, this optimization is something that needs to be done on Prometheus side :/ I think this is the hot path: https://github.com/prometheus/prometheus/blob/main/tsdb/head.go#L1543-L1554

Some improvements that could be made IMHO: https://github.com/prometheus/prometheus/pull/13642 https://github.com/prometheus/prometheus/pull/13632

fpetkovski commented 2 months ago

Cortex and Mimir solve this by adding jitter between compaction for different tenants. We can disable automatic compaction in the TSDB and manage it ourselves.

alanprot commented 2 weeks ago

IDK if is applicable to thanos as well, but we recently added the jitter by AZ so we make sure only 1 az is performing head compaction - and as we replicate in quorum the data the overall latency is not affected. https://github.com/cortexproject/cortex/pull/5928