thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.09k stars 2.1k forks source link

Crash in 0.31.0 in compactor: panic: cannot call histogramIterator. #6983

Open JWThorne opened 10 months ago

JWThorne commented 10 months ago

Thanos, Prometheus and Golang version used: thanos --version thanos, version 0.32.0-dev (branch: main, revision: 6d838e7142d97cce81bcaada932992a4c40a5d80) build user: root@buildkitsandbox build date: 20230421-19:15:42 go version: go1.20.3 platform: linux/amd64 tags: netgo

Prometheus: prometheus, version 2.41.0 (branch: HEAD, revision: c0d8a56c69014279464c0e15d8bfb0e153af0dab) build user: root@d20a03e77067 build date: 20221220-10:40:45 go version: go1.19.4 platform: linux/amd64

Object Storage Provider: Filesystem

What happened: Crash in compactor at the end of a downsample run.

What you expected to happen: No crash

How to reproduce it (as minimally and precisely as possible): run a downsample

Full logs to relevant components:

level=info ts=2023-12-15T19:37:19.907809472Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=11.963259ms duration_ms=11 cached=8 returned=8 partial=0 level=info ts=2023-12-15T19:37:20.186688717Z caller=streamed_block_writer.go:178 msg="finalized downsampled block" mint=1702425600000 maxt=1702598400000 ulid=01HHQFH4BT2G108491WMY9FV0J resolution=300000 panic: cannot call histogramIterator.At

goroutine 799 [running]: github.com/prometheus/prometheus/tsdb/chunkenc.(*histogramIterator).At(0xc000ccdb80?) /go/pkg/mod/github.com/prometheus/prometheus@v0.43.1-0.20230414053501-7309ac272195/tsdb/chunkenc/histogram.go:662 +0x27 github.com/thanos-io/thanos/pkg/compact/downsample.expandChunkIterator({0x2b7d690, 0xc000ccdb80}, 0xc0048ef7f8) /go/src/github.com/thanos-io/thanos/pkg/compact/downsample/downsample.go:473 +0x62 github.com/thanos-io/thanos/pkg/compact/downsample.Downsample({0x2b562a0, 0xc0008d2190}, 0xc0002450e0, {0x2b72da0, 0xc000ab5b80}, {0xc00050a880, 0x1e}, 0x493e0) /go/src/github.com/thanos-io/thanos/pkg/compact/downsample/downsample.go:153 +0x11b8 main.processDownsampling({0x2b6f060, 0xc000aebd60}, {0x2b562a0, 0xc0008d2190}, {0x2b83968, 0xc000a48c20}, 0xc0002450e0, {0xc00050a880, 0x1e}, 0x493e0, ...) /go/src/github.com/thanos-io/thanos/cmd/thanos/downsample.go:377 +0x707 main.downsampleBucket.func3() /go/src/github.com/thanos-io/thanos/cmd/thanos/downsample.go:258 +0x207 created by main.downsampleBucket /go/src/github.com/thanos-io/thanos/cmd/thanos/downsample.go:249 +0xa5c

Anything else we need to know:

yeya24 commented 10 months ago

Downsample native histogram is not supported if I understand correctly. @rabenhorst @fpetkovski

JWThorne commented 10 months ago

how do I disable it then?

yeya24 commented 10 months ago

You have to upgrade your Thanos Compactor to latest main to include https://github.com/thanos-io/thanos/pull/6893 and upload a no downsample marker.

Or maybe it is easier to disable downsampling for now and wait for the release.

JWThorne commented 10 months ago

Oh. I'll have to switch to a minio bucket. We can't retain enough data without downsampling. Thanks for the update.

rabenhorst commented 10 months ago

Downsample native histogram is not supported if I understand correctly. @rabenhorst @fpetkovski

That's right. There is a PR #6350, but it's not ready yet and we have to see if that is the right approach.