Closed Riscky closed 1 month ago
Hi! Is it possible that GCS retention is lower or equal than Tempo retention? That could explain it.
Thanks for the suggestion. We have no lifecycle rules on the bucket at the moment (removed those to check if those were the issue)
TBH, I'm a bit lost. I'm now wondering about the address that the compactors are advertising in the ring instance_addr: "0.0.0.0"
, that might be messing up something.
I have also seen occurrences where 2 nodes where trying to compact the same block, so that checks out.
This heavily points in the ring's direction.
I'm now wondering about the address that the compactors are advertising in the ring instance_addr: "0.0.0.0", that might be messing up something.
We set that specifically because we'd get error initializing module: compactor: failed to create compactor: no useable address found for interfaces [eth0 en0]
errors on startup when we didn't set something.
Setting the instance_addr
to the machine's IP instead of 0.0.0.0
seems to have resolved the issue. Thanks @mapno for pointing me in the right direction.
Describe the bug
We have set up a small Tempo cluster, and it’s running mostly fine. However, we are getting compactor errors almost every compaction cycle):
The error message (
msg
field) is sometimes slightly different:but the inner error (
err
field) is always the aboveBased on what we can find about these errors (https://github.com/grafana/tempo/issues/2560, https://github.com/grafana/tempo/issues/1270, https://community.grafana.com/t/cannot-find-traceids-in-s3-blocks/45423/10 ), it would appear that the compactor ring is not (correctly) formed. I have also seen occurrences where 2 nodes where trying to compact the same block, so that checks out. However, the ring status page looks fine (all three nodes show as active). The ingester ring has formed and we haven't seen any problems with that.
To Reproduce
I haven't been able to reproduce this in a different environment yet.
We're using Tempo version 2.5.0, with Consul as the kv store, GCS as block storage. We're running 3 nodes in scalable-single-binary mode, on a Nomad cluster.
Expected behavior
No compaction errors
Environment:
Additional Context
Our configuration insofar it seems relevant to the issue:
The compaction summary looks fine to me: