thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
12.74k stars 2.04k forks source link

compactor: add series: symbol table size exceeds X bytes: X #7503

Open pelzerim opened 2 days ago

pelzerim commented 2 days ago

Thanos, Prometheus and Golang version used:

docker.io/bitnami/thanos:0.35.1-debian-12-r1

Object Storage Provider:

AWS S3

What happened:

Compaction fails with errors: add series: symbol table size exceeds 4294967295 bytes: 6242209614; symbol table size exceeds 4294967295 bytes: 6242209614 error.

We need to mark the blocks as no-compact manually for the compactor to be able to finish.

What you expected to happen:

Compactor to not crash.

Full logs to relevant components:

Logs

``` ts=2024-07-02T08:30:13.307354531Z caller=compact.go:1203 level=info group="0@{receive=\"true\", replica=\"thanos-receive-0\", tenant_id=\"default-tenant\"}" groupKey=0@13639240377235578737 msg="downloaded and verified blocks; compacting blocks" duration=31m41.884871153s duration_ms=1901884 plan="[/data/compact/0@13639240377235578737/01J1MGHSRPJDY4ECGAWF2CBZ66 /data/compact/0@13639240377235578737/01J1NSR0ENBNPK7VTTCSWF7WT5 /data/compact/0@13639240377235578737/01J1PEAPGVDZ57133V1MYJD6AX /data/compact/0@13639240377235578737/01J1QQK9RRV6SYA1G60F9BPRWW /data/compact/0@13639240377235578737/01J1RC57KK3RFF2TJE35JQAAGB /data/compact/0@13639240377235578737/01J1S0P68KKN2KS9DCQKQ1SERM]" ts=2024-07-02T08:33:39.923139996Z caller=intrumentation.go:67 level=warn msg="changing probe status" status=not-ready reason="compaction: 2 errors: group 0@7410473478044673189: compact blocks [/data/compact/0@7410473478044673189/01J1N548DVFDZ89QVKQWAJYHA1 /data/compact/0@7410473478044673189/01J1NSQZEAYNPGY27S3TYK05G7 /data/compact/0@7410473478044673189/01J1PEAY8AWR5AV97TGC2JJ0M8 /data/compact/0@7410473478044673189/01J1QQKB7RTGPM7CC0FNKZGFM9 /data/compact/0@7410473478044673189/01J1RC53D9JBGR5C48CJ3NZR35 /data/compact/0@7410473478044673189/01J1S0P9F1B0N7NQY5X1MSQ22Q]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 6246367939; symbol table size exceeds 4294967295 bytes: 6246367939; group 0@13639240377235578737: compact blocks [/data/compact/0@13639240377235578737/01J1MGHSRPJDY4ECGAWF2CBZ66 /data/compact/0@13639240377235578737/01J1NSR0ENBNPK7VTTCSWF7WT5 /data/compact/0@13639240377235578737/01J1PEAPGVDZ57133V1MYJD6AX /data/compact/0@13639240377235578737/01J1QQK9RRV6SYA1G60F9BPRWW /data/compact/0@13639240377235578737/01J1RC57KK3RFF2TJE35JQAAGB /data/compact/0@13639240377235578737/01J1S0P68KKN2KS9DCQKQ1SERM]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 6242209614; symbol table size exceeds 4294967295 bytes: 6242209614" ts=2024-07-02T08:33:39.923195406Z caller=http.go:91 level=info service=http/server component=compact msg="internal server is shutting down" err="compaction: 2 errors: group 0@7410473478044673189: compact blocks [/data/compact/0@7410473478044673189/01J1N548DVFDZ89QVKQWAJYHA1 /data/compact/0@7410473478044673189/01J1NSQZEAYNPGY27S3TYK05G7 /data/compact/0@7410473478044673189/01J1PEAY8AWR5AV97TGC2JJ0M8 /data/compact/0@7410473478044673189/01J1QQKB7RTGPM7CC0FNKZGFM9 /data/compact/0@7410473478044673189/01J1RC53D9JBGR5C48CJ3NZR35 /data/compact/0@7410473478044673189/01J1S0P9F1B0N7NQY5X1MSQ22Q]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 6246367939; symbol table size exceeds 4294967295 bytes: 6246367939; group 0@13639240377235578737: compact blocks [/data/compact/0@13639240377235578737/01J1MGHSRPJDY4ECGAWF2CBZ66 /data/compact/0@13639240377235578737/01J1NSR0ENBNPK7VTTCSWF7WT5 /data/compact/0@13639240377235578737/01J1PEAPGVDZ57133V1MYJD6AX /data/compact/0@13639240377235578737/01J1QQK9RRV6SYA1G60F9BPRWW /data/compact/0@13639240377235578737/01J1RC57KK3RFF2TJE35JQAAGB /data/compact/0@13639240377235578737/01J1S0P68KKN2KS9DCQKQ1SERM]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 6242209614; symbol table size exceeds 4294967295 bytes: 6242209614" ts=2024-07-02T08:33:39.923373417Z caller=http.go:110 level=info service=http/server component=compact msg="internal server is shutdown gracefully" err="compaction: 2 errors: group 0@7410473478044673189: compact blocks [/data/compact/0@7410473478044673189/01J1N548DVFDZ89QVKQWAJYHA1 /data/compact/0@7410473478044673189/01J1NSQZEAYNPGY27S3TYK05G7 /data/compact/0@7410473478044673189/01J1PEAY8AWR5AV97TGC2JJ0M8 /data/compact/0@7410473478044673189/01J1QQKB7RTGPM7CC0FNKZGFM9 /data/compact/0@7410473478044673189/01J1RC53D9JBGR5C48CJ3NZR35 /data/compact/0@7410473478044673189/01J1S0P9F1B0N7NQY5X1MSQ22Q]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 6246367939; symbol table size exceeds 4294967295 bytes: 6246367939; group 0@13639240377235578737: compact blocks [/data/compact/0@13639240377235578737/01J1MGHSRPJDY4ECGAWF2CBZ66 /data/compact/0@13639240377235578737/01J1NSR0ENBNPK7VTTCSWF7WT5 /data/compact/0@13639240377235578737/01J1PEAPGVDZ57133V1MYJD6AX /data/compact/0@13639240377235578737/01J1QQK9RRV6SYA1G60F9BPRWW /data/compact/0@13639240377235578737/01J1RC57KK3RFF2TJE35JQAAGB /data/compact/0@13639240377235578737/01J1S0P68KKN2KS9DCQKQ1SERM]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 6242209614; symbol table size exceeds 4294967295 bytes: 6242209614" ts=2024-07-02T08:33:39.923413385Z caller=intrumentation.go:81 level=info msg="changing probe status" status=not-healthy reason="compaction: 2 errors: group 0@7410473478044673189: compact blocks [/data/compact/0@7410473478044673189/01J1N548DVFDZ89QVKQWAJYHA1 /data/compact/0@7410473478044673189/01J1NSQZEAYNPGY27S3TYK05G7 /data/compact/0@7410473478044673189/01J1PEAY8AWR5AV97TGC2JJ0M8 /data/compact/0@7410473478044673189/01J1QQKB7RTGPM7CC0FNKZGFM9 /data/compact/0@7410473478044673189/01J1RC53D9JBGR5C48CJ3NZR35 /data/compact/0@7410473478044673189/01J1S0P9F1B0N7NQY5X1MSQ22Q]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 6246367939; symbol table size exceeds 4294967295 bytes: 6246367939; group 0@13639240377235578737: compact blocks [/data/compact/0@13639240377235578737/01J1MGHSRPJDY4ECGAWF2CBZ66 /data/compact/0@13639240377235578737/01J1NSR0ENBNPK7VTTCSWF7WT5 /data/compact/0@13639240377235578737/01J1PEAPGVDZ57133V1MYJD6AX /data/compact/0@13639240377235578737/01J1QQK9RRV6SYA1G60F9BPRWW /data/compact/0@13639240377235578737/01J1RC57KK3RFF2TJE35JQAAGB /data/compact/0@13639240377235578737/01J1S0P68KKN2KS9DCQKQ1SERM]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 6242209614; symbol table size exceeds 4294967295 bytes: 6242209614" ts=2024-07-02T08:33:39.923530079Z caller=main.go:171 level=error err="2 errors: group 0@7410473478044673189: compact blocks [/data/compact/0@7410473478044673189/01J1N548DVFDZ89QVKQWAJYHA1 /data/compact/0@7410473478044673189/01J1NSQZEAYNPGY27S3TYK05G7 /data/compact/0@7410473478044673189/01J1PEAY8AWR5AV97TGC2JJ0M8 /data/compact/0@7410473478044673189/01J1QQKB7RTGPM7CC0FNKZGFM9 /data/compact/0@7410473478044673189/01J1RC53D9JBGR5C48CJ3NZR35 /data/compact/0@7410473478044673189/01J1S0P9F1B0N7NQY5X1MSQ22Q]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 6246367939; symbol table size exceeds 4294967295 bytes: 6246367939; group 0@13639240377235578737: compact blocks [/data/compact/0@13639240377235578737/01J1MGHSRPJDY4ECGAWF2CBZ66 /data/compact/0@13639240377235578737/01J1NSR0ENBNPK7VTTCSWF7WT5 /data/compact/0@13639240377235578737/01J1PEAPGVDZ57133V1MYJD6AX /data/compact/0@13639240377235578737/01J1QQK9RRV6SYA1G60F9BPRWW /data/compact/0@13639240377235578737/01J1RC57KK3RFF2TJE35JQAAGB /data/compact/0@13639240377235578737/01J1S0P68KKN2KS9DCQKQ1SERM]: 2 errors: add series: symbol table size exceeds 4294967295 bytes: 6242209614; symbol table size exceeds 4294967295 bytes: 6242209614\ncompaction\nmain.runCompact.func7\n\t/bitnami/blacksmith-sandox/thanos-0.35.1/src/github.com/thanos-io/thanos/cmd/thanos/compact.go:440\nmain.runCompact.func8\n\t/bitnami/blacksmith-sandox/thanos-0.35.1/src/github.com/thanos-io/thanos/cmd/thanos/compact.go:520\ngithub.com/oklog/run.(*Group).Run.func1\n\t/bitnami/blacksmith-sandox/thanos-0.35.1/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/opt/bitnami/go/src/runtime/asm_arm64.s:1197\ncompact command failed\nmain.main\n\t/bitnami/blacksmith-sandox/thanos-0.35.1/src/github.com/thanos-io/thanos/cmd/thanos/main.go:171\nruntime.main\n\t/opt/bitnami/go/src/runtime/proc.go:267\nruntime.goexit\n\t/opt/bitnami/go/src/runtime/asm_arm64.s:1197" ```

Anything else we need to know:

The thanos cluster is pretty large and the receiver components report 9M active series each (12h retention). The receiver component has a replicationFactor of 2, thus compactor was supposed to do de-duplication.

Retention on the receiver was reduced to 12h to deal with cardinality as we have extremely high pod churn.

Is there any way to deal with this? I am considering reducing block size in the receiver by setting --tsdb.min-block-duration=1h + --tsdb.max-block-duration=1h but i am afraid of side-affects

harry671003 commented 2 days ago

This is a hard limit in the symbol table size in Prometheus TSDB of 4GB. See: https://github.com/prometheus/prometheus/blob/main/tsdb/docs/format/index.md#symbol-table. The length field in the symbol table cannot exceed 4 bytes which limits the size to 2^32 bytes.

Is it possible to implement some kind of sharding in your cluster? Thanos supports sharding using re-labelling: https://thanos.io/tip/thanos/sharding.md/#relabelling With sharding, compactors can create sharded smaller blocks and you'll likely not run into this hard limit.

pelzerim commented 1 day ago

Is it possible to implement some kind of sharding in your cluster? Thanos supports sharding using re-labelling: https://thanos.io/tip/thanos/sharding.md/#relabelling With sharding, compactors can create sharded smaller blocks and you'll likely not run into this hard limit.

@harry671003 Thanks for the prompt response. I have checked this out but it is super unclear to me how i would apply this.

The metrics come from 26 clusters and are all tagged with a globally unique label per cluster. We have 3 large receivers and all clusters write to a single remote write endpoint.

So for example is it possible to run 2 compactor instances each responsible for half the clusters? Would they not drop all blocks as we have a single remote write for all clusters?

See: https://github.com/prometheus/prometheus/blob/main/tsdb/docs/format/index.md#symbol-table.

Do i understand this correctly that dropping labels (for example uid) will reduce the size of this table?

harry671003 commented 8 hours ago

I have checked this out but it is super unclear to me how i would apply this.

I'll admit that I haven't used this myself. Full disclaimer, I work on Cortex, another project that shares some code with Thanos. Maybe you can reach out on the Thanos slack channel asking for help.

Do i understand this correctly that dropping labels (for example uid) will reduce the size of this table?

Yes. The symbol table contains all the unique label strings. Dropping high cardinality labels will reduce the size of the symbol table.