thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
12.73k stars 2.04k forks source link

compactor: series not 16-byte aligned error #7288

Open vincent-olivert-riera opened 2 months ago

vincent-olivert-riera commented 2 months ago

Thanos, Prometheus and Golang version used: Thanos: 0.32.4 Golang: go1.20.8

Object Storage Provider: Openstack S3 compatible

What happened: Relatively often (once every day or two), I see an error like this which halts Thanos Compactor:

{"caller":"compact.go:491","err":"compaction: group 0@12554084359366753854: compact blocks [/var/thanos/compact/compact/0@12554084359366753854/01HVNYRWFE7HEXWYJVZ0JQGMSK /var/thanos/compact/compact/0@12554084359366753854/01HVPT37AQV855B3V92PEQRQN4 /var/thanos/compact/compact/0@12554084359366753854/01HVQNGM0Z4QQF9JH36KJ2Q5BE /var/thanos/compact/compact/0@12554084359366753854/01HVRH3V38WABWE2AFCGMEZKS6 /var/thanos/compact/compact/0@12554084359366753854/01HVSCKP4D89TME69TH2CCDNW2 /var/thanos/compact/compact/0@12554084359366753854/01HVT811GWBJFRW63B0MTRQ08P]: series not 16-byte aligned at 5596754042","level":"error","msg":"critical error detected; halting","ts":"2024-04-19T06:47:36.06551946Z"}

At the beginning I did not include the Prometheus version because I think it is not relevant, since those blocks are all level-2 (8 hours) blocks created by Thanos Compactor.

What I normally do to fix this blocker is to mark those blocks as no-compaction, and then restart Thanos Compactor. But I think this should not be happening. Or at least, it should not happen so often.

yeya24 commented 2 months ago

That looks like a bug. I have never seen this issue myself.

vincent-olivert-riera commented 2 months ago

That looks like a bug. I have never seen this issue myself.

In the past, it happened to other people as well, but in Prometheus: https://github.com/prometheus/prometheus/issues/7373

It seems that @bwplotka fixed it at that time, but perhaps Thanos still has the same issue?