Thanos receive fails "no space left on device"

Thanos, Prometheus and Golang version used: v0.35.0 and Prometheus v2.48.0

Object Storage Provider: Azure Blob

What happened: The receive pod run couple of days without errors, then it started to crash loop back. The receive is running on a cluster, the compactor is running on a different cluster.

Cluster1: Thanos query and thanos receive without "no space left on device" issue
Cluster2: Thanos query and thanos receive with "no space left on device" issue
Cluster3: Thanos query, ruler, storegateway, the compactor and the receive no issue

All the Thanos store components are using the same storage config (Azure Blob Storage)

What you expected to happen: The receive in cluster #2 keeps running the same way the other receives in cluster #1 and #3

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

Logs

16062400000 ulid=01HY6R27XEJASRDRJZPCQFH4MM ts=2024-05-26T03:08:38.106234246Z caller=repair.go:56 level=info component=receive component=multi-tsdb tenant=default-tenant msg="Found healthy block" mint=1716062400012 maxt=1716069600000 ulid=01HY6YXZHGXYNHB0V23SR7HTFR ts=2024-05-26T03:08:38.106255535Z caller=repair.go:56 level=info component=receive component=multi-tsdb tenant=default-tenant msg="Found healthy block" mint=1716069600029 maxt=1716076800000 ulid=01HY75SPSN64JGPNEVMPH9JY5H ts=2024-05-26T03:08:38.10627417Z caller=repair.go:56 level=info component=receive component=multi-tsdb tenant=default-tenant msg="Found healthy block" mint=1716076800032 maxt=1716084000000 ulid=01HY7CNDQXW6RZ9RA39029ZX1G ts=2024-05-26T03:08:38.106915603Z caller=receive.go:601 level=info component=receive msg="shutting down storage" ts=2024-05-26T03:08:38.106926284Z caller=receive.go:605 level=info component=receive msg="storage is flushed successfully" ts=2024-05-26T03:08:38.1069309Z caller=receive.go:611 level=info component=receive msg="storage is closed" ts=2024-05-26T03:08:38.106943423Z caller=http.go:91 level=info component=receive service=http/server component=receive msg="internal server is shutting down" err="opening storage: open /var/thanos/receive/default-tenant/wal/00001125: no space left on device" ts=2024-05-26T03:08:38.106963196Z caller=receive.go:693 level=info component=receive component=uploader msg="uploading the final cut block before exiting" ts=2024-05-26T03:08:38.106983989Z caller=receive.go:702 level=info component=receive component=uploader msg="the final cut block was uploaded" uploaded=0 ts=2024-05-26T03:08:38.107007441Z caller=http.go:110 level=info component=receive service=http/server component=receive msg="internal server is shutdown gracefully" err="opening storage: open /var/thanos/receive/default-tenant/wal/00001125: no space left on device" ts=2024-05-26T03:08:38.107022125Z caller=intrumentation.go:81 level=info component=receive msg="changing probe status" status=not-healthy reason="opening storage: open /var/thanos/receive/default-tenant/wal/00001125: no space left on device" ts=2024-05-26T03:08:38.107064152Z caller=grpc.go:138 level=info component=receive service=gRPC/server component=receive msg="internal server is shutting down" err="opening storage: open /var/thanos/receive/default-tenant/wal/00001125: no space left on device" ts=2024-05-26T03:08:38.10708308Z caller=grpc.go:151 level=info component=receive service=gRPC/server component=receive msg="gracefully stopping internal server" ts=2024-05-26T03:08:38.107113074Z caller=grpc.go:164 level=info component=receive service=gRPC/server component=receive msg="internal server is shutdown gracefully" err="opening storage: open /var/thanos/receive/default-tenant/wal/00001125: no space left on device" ts=2024-05-26T03:08:38.107129198Z caller=intrumentation.go:81 level=info component=receive msg="changing probe status" status=not-healthy reason="opening storage: open /var/thanos/receive/default-tenant/wal/00001125: no space left on device" ts=2024-05-26T03:08:38.107211886Z caller=main.go:171 level=error err="open /var/thanos/receive/default-tenant/wal/00001125: no space left on device\nopening storage\nmain.startTSDBAndUpload.func1\n\t/bitnami/blacksmith-sandox/thanos-0.35.0/src/github.com/thanos-io/thanos/cmd/thanos/receive.go:643\ngithub.com/oklog/run.(*Group).Run.func1\n\t/bitnami/blacksmith-sandox/thanos-0.35.0/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/opt/bitnami/go/src/runtime/asm_amd64.s:1650\nreceive command failed\nmain.main\n\t/bitnami/blacksmith-sandox/thanos-0.35.0/src/github.com/thanos-io/thanos/cmd/thanos/main.go:171\nruntime.main\n\t/opt/bitnami/go/src/runtime/proc.go:267\nruntime.goexit\n\t/opt/bitnami/go/src/runtime/asm_amd64.s:1650"

Anything else we need to know:

receive
- --log.level=info
- --log.format=logfmt
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --remote-write.address=0.0.0.0:19291
- --objstore.config=$(OBJSTORE_CONFIG)
- --tsdb.path=/var/thanos/receive
- --label=replica="$(NAME)"
- --label=receive="true"
- --tsdb.retention=15d
- --receive.local-endpoint=127.0.0.1:10901
- --receive.hashrings-file=/var/lib/thanos-receive/hashrings.json
- --receive.replication-factor=1

thanos-io / thanos

Thanos receive fails "no space left on device" #7391