grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
4.03k stars 522 forks source link

Tempo Compactor Restarts #4111

Open snyaik opened 1 month ago

snyaik commented 1 month ago

Hi Team,

We have encountering tempo compaction alerts daily. After going through the logs we found the following log message: msg="error during compaction cycle" err="error iterating input blocks: error iterating through block XXXXX error in range read from s3 backend: unexpected EOF"

There isn't much info available on how to resolve this intermittent issue.

Compaction window is 1h max compaction objects is 6000000 Compaction Cycle is 30s block retention is 72h

javiermolinar commented 1 month ago

Hi, what version of Tempo are you using? What's the frequency of these errors? Does the error happen for the same blocks more than once? As you mentioned, this is an intermittent error, compactors should retry the compaction on the next run.

snyaik commented 1 month ago

@javiermolinar tempo version 2.4.1 Alert is received with a frequency of ~ 3 days. I know the alert says the compaction should work on next retry but just want to be sure that when more applications are onboarded this does not become a permanent annoying issue. I have already update max_bytes_per_trace from 30MB to 35MB. Is there anything which can be done to resolve the alert permanently.

javiermolinar commented 1 month ago

There's been some work recently in that area:

https://github.com/grafana/tempo/issues/3750

although upgrading 2 versions of Tempo can be tricky.

You can always try tuning some compactor configs to reduce the number of compactions:

https://github.com/grafana/tempo/blob/main/operations/tempo-mixin/runbook.md#tempocompactionsfailing