thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.05k stars 2.09k forks source link

Thanos store crashes if denied viewing bucket/deletion-mark.json #6170

Open rmartinez3 opened 1 year ago

rmartinez3 commented 1 year ago

thanos store version using thanos:0.28.0

logs level=error ts=2023-02-27T21:16:24.365661978Z caller=main.go:158 err="Access Denied.\nget file: 01FH2F7JDEQ6VM3JK6GCWKFS1M/deletion-mark.json\ngithub.com/thanos-io/thanos/pkg/block/metadata.ReadMarker\n\t/app/pkg/block/metadata/markers.go:99\ngithub.com/thanos-io/thanos/pkg/block.(*IgnoreDeletionMarkFilter).Filter.func1\n\t/app/pkg/block/fetcher.go:834\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\nfilter blocks marked for deletion\ngithub.com/thanos-io/thanos/pkg/block.(*IgnoreDeletionMarkFilter).Filter\n\t/app/pkg/block/fetcher.go:879\ngithub.com/thanos-io/thanos/pkg/block.(*BaseFetcher).fetch\n\t/app/pkg/block/fetcher.go:458\ngithub.com/thanos-io/thanos/pkg/block.(*MetaFetcher).Fetch\n\t/app/pkg/block/fetcher.go:497\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).SyncBlocks\n\t/app/pkg/store/bucket.go:462\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).InitialSync\n\t/app/pkg/store/bucket.go:531\nmain.runStore.func3\n\t/app/cmd/thanos/store.go:366\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\nfilter metas\ngithub.com/thanos-io/thanos/pkg/block.(*BaseFetcher).fetch\n\t/app/pkg/block/fetcher.go:459\ngithub.com/thanos-io/thanos/pkg/block.(*MetaFetcher).Fetch\n\t/app/pkg/block/fetcher.go:497\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).SyncBlocks\n\t/app/pkg/store/bucket.go:462\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).InitialSync\n\t/app/pkg/store/bucket.go:531\nmain.runStore.func3\n\t/app/cmd/thanos/store.go:366\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\nsync block\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).InitialSync\n\t/app/pkg/store/bucket.go:532\nmain.runStore.func3\n\t/app/cmd/thanos/store.go:366\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\nbucket store initial sync\nmain.runStore.func3\n\t/app/cmd/thanos/store.go:368\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\nstore command failed\nmain.main\n\t/app/cmd/thanos/main.go:158\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"

Overall thanos tool end was run to do cleanup. marking specific buckets to remove . The user that it uses differs from the user that thanos overall uses for sidecar/store.

Overall leveraging ceph rgw and rook for provisioning bucket/s3 store Added a seperate user and use policy to add to bucket the specific user to get it to run tool to run bucket mark bucket retention

I am guessing the policy I give made it so it can't read the other tool-cleanup user.

policy used

` Location: rook-ceph-store Payer: BucketOwner Expiration Rule: none Policy: { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": {"AWS": ["arn:aws:iam:::user/thanos-cleaner-test"]}, "Action": ["s3:GetObject","s3:ListBucket","s3:PutObject","s3:DeleteObject","s3:DeleteObjectVersion"], "Resource": [ "arn:aws:s3:::thanos-bkt-UID*", "arn:aws:s3:::thanos-bkt-UID" ] }] }

CORS: none ACL: ceph-user-hdeuG8Tw: FULL_CONTROL `

This overall could be that I need to update policy but was wondering overall thoughts if store should crash reading the mark.json.

Will look on my end playing with policies and updating to latest thanos

An extra note: Had two thanos-store instances running. One oom and crash and fail start up due to error posted. While other was logging warning but not crashing. (oom reason is need to bump resources on my end. Though thought was something to note.)

ghost commented 1 year ago

Hello, We are also seeing similar "Access Denied" error but different error message level=error ts=2023-02-28T11:30:25.378634943Z caller=main.go:161 err="Access Denied\nBaseFetcher: iter bucket\ngithub.com/thanos-io/thanos/pkg/block.(*BaseFetcher).fetchMetadata\n\t/app/pkg/block/fetcher.go:383\ngithub.com/thanos-io/thanos/pkg/block.(*BaseFetcher).fetch.func2\n\t/app/pkg/block/fetcher.go:447\ngithub.com/golang/groupcache/singleflight.(*Group).Do\n\t/go/pkg/mod/github.com/golang/groupcache@v0.0.0-20210331224755-41bb18bfe9da/singleflight/singleflight.go:56\ngithub.com/thanos-io/thanos/pkg/block.(*BaseFetcher).fetch\n\t/app/pkg/block/fetcher.go:445\ngithub.com/thanos-io/thanos/pkg/block.(*MetaFetcher).Fetch\n\t/app/pkg/block/fetcher.go:505\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).SyncBlocks\n\t/app/pkg/store/bucket.go:521\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).InitialSync\n\t/app/pkg/store/bucket.go:590\nmain.runStore.func3\n\t/app/cmd/thanos/store.go:384\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172\nsync block\ngithub.com/thanos-io/thanos/pkg/store.(*BucketStore).InitialSync\n\t/app/pkg/store/bucket.go:591\nmain.runStore.func3\n\t/app/cmd/thanos/store.go:384\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172\nbucket store initial sync\nmain.runStore.func3\n\t/app/cmd/thanos/store.go:386\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172\nstore command failed\nmain.main\n\t/app/cmd/thanos/main.go:161\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_arm64.s:1172" This was seeing after updating thanos-store from v0.27.0 to 0.30.2! Any help from admin here, would be great to resolve this issue..

ghost commented 1 year ago

This was been resolved now, after setting aws_sdk_auth: true at this level: https://thanos.io/tip/thanos/storage.md/#s3