thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.01k stars 2.09k forks source link

Thanos Compactor Fails to Delete Downsampling Data & Thanos Store failed to block #7117

Open swamipeddineni opened 8 months ago

swamipeddineni commented 8 months ago

We have observed this issue after updating Thanos from 0.30.2 to 0.33.0

Compactor Issue ts=2024-02-02T06:27:40.034146085Z caller=compact.go:1466 level=info msg="start of GC" ts=2024-02-02T06:27:40.03659691Z caller=compact.go:1486 level=warn msg="failed deleting non-compaction group directories/files, some disk space usage might have leaked. Continuing" err="delete file/dir: read dir: open /data/compact/0@15098824445636559513: permission denied" dir=/data/compact ts=2024-02-02T06:27:40.03663071Z caller=compact.go:1489 level=info msg="start of compactions" ts=2024-02-02T06:27:40.037110915Z caller=compact.go:1109 level=info group="0@{region=\"EastUS\", tier=\"stg\"}" groupKey=0@15098824445636559513 msg="compaction available and planned" plan="[01HNG0HFB60N575CXSX85G2ZEK (min time: 1706702400004, max time: 1706709600000) 01HNG0HFTSWD1QT1JPH1C07EDE (min time: 1706702400004, max time: 1706709600000)]" ts=2024-02-02T06:27:40.037140915Z caller=compact.go:1118 level=info group="0@{region=\"EastUS\", tier=\"stg\"}" groupKey=0@15098824445636559513 msg="finished running pre compaction callback; downloading blocks" plan="[01HNG0HFB60N575CXSX85G2ZEK (min time: 1706702400004, max time: 1706709600000) 01HNG0HFTSWD1QT1JPH1C07EDE (min time: 1706702400004, max time: 1706709600000)]" duration=4.4µs duration_ms=0 ts=2024-02-02T06:27:40.037275216Z caller=compact.go:502 level=error msg="retriable error" err="compaction: group 0@15098824445636559513: download block 01HNG0HFB60N575CXSX85G2ZEK: create dir: mkdir /data/compact/0@15098824445636559513/01HNG0HFB60N575CXSX85G2ZEK: permission denied"

Store issue index header reader: write index header: new index reader: get TOC from object storage of 01HEW2MNP8MX87XBPWWZYQPBPD/index: open temp file: open /data/01HEW2MNP8MX87XBPWWZYQPBPD/a2a4f6c4-50ad-48ff-9856-7fdd92489827.part-0: permission denied" ts=2024-02-02T09:57:18.918848599Z caller=bucket.go:742 level=warn msg="failed to remove block we cannot load" err="unlinkat /data/01H9BHW6QC4JX80CSBT3RR8EHW/index-header: permission denied" ts=2024-02-02T09:57:18.918866499Z caller=bucket.go:745 level=warn msg="loading block failed" elapsed=4.910635ms id=01H9BHW6QC4JX80CSBT3RR8EHW err="create index header reader: write index header: new index reader: get TOC from object storage of 01H9BHW6QC4JX80CSBT3RR8EHW/index: open temp file: open /data/01H9BHW6QC4JX80CSBT3RR8EHW/64291988-90ef-47a6-bfba-942790b379fd.part-0: permission denied" ts=2024-02-02T09:57:18.919117506Z caller=bucket.go:742 level=warn msg="failed to remove block we cannot load" err="unlinkat /data/01HNJ0WGDR3TH397Q5VRTE1XKM/index-header: permission denied"

Please help in sloving this issue

vroldanbet commented 4 months ago

We hit this after an upgrade to 0.35.0. It turns out in version 0.32.0 the thanos docker image started building with USER 1001, which meant it wouldn't have permission to the files local block cache created by previous versions that were running as root. I wiped the persistent volumes and let thanos rebuild the cache, which worked. Also, make sure your securityContext is adjusted accordingly.