thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.06k stars 2.09k forks source link

Store, Compactor: meta.json file exists: stat s3 object: 400 Bad Request #5416

Open ahmadbabaeimoghadam opened 2 years ago

ahmadbabaeimoghadam commented 2 years ago

Thanos, Prometheus and Golang version used:

Thanos version:

thanos, version 0.26.0 (branch: HEAD, revision: 17c576472d80972bfd3705e1e0a08e6f8da8e04b)
  build user:       root@95d2db822102
  build date:       20220505-12:53:36
  go version:       go1.17.9
  platform:         linux/amd64

Prometheus version:

prometheus, version 2.35.0 (branch: HEAD, revision: 6656cd29fe6ac92bab91ecec0fe162ef0f187654)
  build user:       root@cf6852b14d68
  build date:       20220421-09:53:42
  go version:       go1.18.1
  platform:         linux/amd64

Object Storage Provider: S3

What happened: Thanos stores and compactors are showing the following error repeatedly:

thanos-store[22925]: level=warn ts=2022-06-09T17:01:51.152159168Z caller=store.go:375 msg="syncing blocks failed" err="incomplete view: 874 errors: meta.json file exists: 01FZKWJH5VMGS3ZJEKKRAPGMBW/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZKWJGPVCMFNH9G7E5PEDDT2/meta.json: stat s3 object: 400 Bad Request; 
...

My S3 cost is drastically increased since my new configuration is set and the reason is the head requests.

What you expected to happen: They do not show this error at all and the cost is reasonable.

How to reproduce it (as minimally and precisely as possible): Having one s3 bucket that contains TSDB blocks of different systems, with different external_labels, and having multiple thanos stores and compactors, one for each system, configured to keep only blocks of each system, which contains external labels of that system. Based on https://github.com/thanos-io/thanos/blob/main/docs/sharding.md, it should be possible to have such a configuration if I am not mistaken.

Full logs to relevant components:

Logs

``` hanos-store[22925]: level=warn ts=2022-06-09T17:01:51.152159168Z caller=store.go:375 msg="syncing blocks failed" err="incomplete view: 874 errors: meta.json file exists: 01FZKWJH5VMGS3ZJEKKRAPGMBW/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZKWJGPVCMFNH9G7E5PEDDT2/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZKWJGPWZ6D8PDXVZPA3NHFB/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZKWJH8Q4Y0D31122GY2NASR/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZKWJHBA8HP4VDJ8YY6FEGT0/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZKWJJ43BWAFAHAT8BG7VTFV/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E6X6X8JBAP3JQZPP8A0Y/meta.json: stat s3 object: 400 BadRequest; meta.json file exists: 01FZM3E731TNSDAT69P5Z060EW/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZKWJHE9YCNFP8H4YBJR9E57/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E736B09F4DTJBFT2HRB0/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E77ZDE7222XDZJR0M59D/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E7Z4C6XN9KE72JTP2T2K/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E7S21XFMGPMYR4AVPYK3/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E7GZ6KF5JWX51NYSSEC0/meta.json: stat s3 object: 400 Bad Request; meta.json file exists:01FZM3E7HF4Y789DTHV24W40SN/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E7DS39BW04EM25PJH4WP/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E7A5FDWFKEK88XE5R0H6/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9YB17J8R3RQPEED8HMPV/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E8MRZSRFXRGXJY0Y62B9/meta.json:stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E8HASA4B99W41DBNASJC/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E8HE25YZNWGZP6SY3586/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E8J6ZNXEFV5T8ZVP7DGV/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E8BSEWGS48N8DH413HKZ/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E7ZB3FZCY078PWEHVADV/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E7J0ZXCA5YQ1M586B72W/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9YSV7XTXKZMQDAKW8C7V/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9Y56WQKSRD5R1VXHSB2B/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9YRX1TDRCG1G2ZB0Y98P/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9YFYNTK17BAXD7C89J2V/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZM3E9C339QW63JKWW8FJ5CJ/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9YB1CNZCPA6PFJ36K197/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9Z7HZB7TV10A3T1XCZYP/meta.json: stat s3 object: 400Bad Request; meta.json file exists: 01FZMA9YV2KRMG35M0BHY596NC/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9Z78NRF2JBYA2FF6ZPGS/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9Z0TJWMS4MVHQWJX71GD/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9YNPCBCS96VPDHZN1032/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9YJ8GWRPZ9H4R9P5YCE1/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9ZJ0S0AYHCH4NXFQD142/meta.json: stat s3object: 400 Bad Request; meta.json file exists: 01FZMA9ZWQ5AJRTT3JYQWV3GT8/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMAA0M20CADS2WVE85AFXA4/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9ZRQ87EJM5QTCZHF9GC0/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMH5NDHEEPSM05H9Q5M0KTK/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9ZP8AJZX4ZF2J8BZV82K/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMH5NK151B829XDZH57JGH9/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMA9ZSBC1W13FV6XAEBJ644/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMH5NK2PH34G0C1SC01S3JW/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMH5NQZ5G5HAPZAX66V1FDQ/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMR1D5N2WHMX31G3R34SZRT/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMH5NTQF9P7D1ZAR6VKZGNA/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMR1DAFK3D4Z4TWVMFZHNC8/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMR1D2E4B0PRPPZ5KQ1F7W0/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMR1DPQF2AQV50FCPS6SG2X/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMR1D9E5TE2VAWHRZXSGERY/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMH5PZXT3WZ2HRYW18QFXV2/meta.json: stat s3 object: 400 Bad Request; meta.json file exists: 01FZMR1CV17AEPY8D4GTMFMXBA/meta.json: stat s3 object: 400 Bad Request; ```

Anything else we need to know:

Environment:

moadz commented 2 years ago

Are you writing all of your blocks to the same bucket?

You cannot run multiple Thanos compactors agains the same block stream (https://thanos.io/tip/components/compact.md/#warning-only-one-instance-of-compactor-may-run-against-a-single-stream-of-blocks-in-a-single-object-storage)

The error you're getting is because multiple compactors are attempting to compact the same block (fetch/edit the meta.json file), and your object storage provider is locking the blob whilst a single compactor is already looking at it.

ahmadbabaeimoghadam commented 2 years ago

Are you writing all of your blocks to the same bucket?

You cannot run multiple Thanos compactors agains the same block stream (https://thanos.io/tip/components/compact.md/#warning-only-one-instance-of-compactor-may-run-against-a-single-stream-of-blocks-in-a-single-object-storage)

The error you're getting is because multiple compactors are attempting to compact the same block (fetch/edit the meta.json file), and your object storage provider is locking the blob whilst a single compactor is already looking at it.

Thanks, @moadz, for your reply!

We are running multiple Thanos compactors against one S3 bucket, but different block streams, I think. Since we are setting --selector.relabel-config to only keeping blocks related to one system for each compactor. Shouldn't it work?

moadz commented 2 years ago

What does your --selector.relabel-config look like?

ahmadbabaeimoghadam commented 2 years ago

What does your --selector.relabel-config look like?

- action: keep
  regex: "environment-name"
  source_labels:
    - environment

environment is the external label for each system

ahmadbabaeimoghadam commented 2 years ago

What does your --selector.relabel-config look like?

- action: keep
  regex: "environment-name"
  source_labels:
    - environment

environment is the external label for each system

@moadz any updates?

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.