Open alextricity25 opened 1 week ago
This seems to be an incompatibility between R2 and S3. Here is a similar discussion on cloudflare's forums:
https://community.cloudflare.com/t/all-non-trailing-parts-must-have-the-same-length/552190
It seems that when pushing a multipart upload to R2 all segments must have the same length except for the final part. When uploading a block Tempo currently flushes a rowgroup at a time which has a variable number of bytes. This is not easily corrected in Tempo.
Unless you find some very clean way to resolve this in Tempo we would likely not take a PR to "fix" this issue. To use Tempo with R2 we will likely need R2 to be S3 compatible.
Thanks for your reply @joe-elliott. Yes, I ran into that post too, and a few others, but I figured maybe there is a way to configure Tempo to send chunks of the same length. Wouldn't setting parquet_row_group_size_bytes
effectively do this?
Another interesting thing to note is that some compaction cycles do end up completing successfully. Not all fail with the All non-trailing parts must have the same length
error. For example, here are some more logs from the compactor, which show that blocks are indeed successfully being flushed (does "flushed" in this context mean being sent to the backend storage bucket via the s3 api?).
level=info ts=2024-09-20T13:49:05.555318241Z caller=compactor.go:186 msg="beginning compaction" traceID=42f3b0ba265a0113
level=info ts=2024-09-20T13:49:05.555355933Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:ab121e03-8931-49e0-b144-11e908cf6d37 TenantID:single-tenant StartTime:2024-09-16 16:48:10 +0000 UTC EndTime:2024-09-16 17:19:30 +0000 UTC Tot
alObjects:40030 Size:197946447 CompactionLevel:0 Encoding:none IndexPageSize:0 TotalRecords:3 DataEncoding: BloomShardCount:1 FooterSize:50483 DedicatedColumns:[] ReplicationFactor:0}"
level=info ts=2024-09-20T13:49:05.653482732Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:8a64f55c-362e-4e10-bba4-f8832e6b2d36 TenantID:single-tenant StartTime:2024-09-16 16:55:20 +0000 UTC EndTime:2024-09-16 17:26:35 +0000 UTC Tot
alObjects:40253 Size:197983074 CompactionLevel:0 Encoding:none IndexPageSize:0 TotalRecords:3 DataEncoding: BloomShardCount:1 FooterSize:50483 DedicatedColumns:[] ReplicationFactor:0}"
level=info ts=2024-09-20T13:49:05.791970019Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:69f50bc3-df70-4256-a622-5d3297170408 TenantID:single-tenant StartTime:2024-09-16 17:18:04 +0000 UTC EndTime:2024-09-16 17:56:45 +0000 UTC Tot
alObjects:4927 Size:3881037 CompactionLevel:1 Encoding:none IndexPageSize:0 TotalRecords:1 DataEncoding: BloomShardCount:1 FooterSize:18783 DedicatedColumns:[] ReplicationFactor:0}"
level=info ts=2024-09-20T13:49:53.774626347Z caller=compactor.go:250 msg="flushed to block" bytes=68231216 objects=14956 values=99996196
level=info ts=2024-09-20T13:50:37.246234046Z caller=compactor.go:250 msg="flushed to block" bytes=68339859 objects=15104 values=99993405
level=error ts=2024-09-20T13:51:19.25979008Z caller=compactor.go:162 msg="error during compaction cycle" err="error shipping block to backend, blockID 607efda4-c183-40b7-89e5-90c19ca6c463: error completing block: error completing multipart upload, object: single-tenan
t/607efda4-c183-40b7-89e5-90c19ca6c463/data.parquet, obj etag: : All non-trailing parts must have the same length."
level=info ts=2024-09-20T13:51:19.2598735Z caller=compactor.go:155 msg="Compacting hash" hashString=single-tenant-479584-0
level=info ts=2024-09-20T13:51:19.259919754Z caller=compactor.go:186 msg="beginning compaction" traceID=36d004b60c5e7aa1
level=info ts=2024-09-20T13:51:19.259962307Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:c0c857ff-d006-4f33-ad48-ea7a7bbe1031 TenantID:single-tenant StartTime:2024-09-16 16:25:10 +0000 UTC EndTime:2024-09-16 16:56:25 +0000 UTC Tot
alObjects:43818 Size:218888946 CompactionLevel:0 Encoding:none IndexPageSize:0 TotalRecords:4 DataEncoding: BloomShardCount:1 FooterSize:66804 DedicatedColumns:[] ReplicationFactor:0}"
level=info ts=2024-09-20T13:51:19.395246194Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:90f85458-f556-49b2-b78c-41ace5cc89fc TenantID:single-tenant StartTime:2024-09-16 16:17:51 +0000 UTC EndTime:2024-09-16 16:49:20 +0000 UTC Tot
alObjects:45199 Size:224820188 CompactionLevel:0 Encoding:none IndexPageSize:0 TotalRecords:4 DataEncoding: BloomShardCount:1 FooterSize:67197 DedicatedColumns:[] ReplicationFactor:0}"
level=info ts=2024-09-20T13:51:19.539027714Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:d36b0a24-644a-4698-9588-07db2b0f9188 TenantID:single-tenant StartTime:2024-09-16 15:47:40 +0000 UTC EndTime:2024-09-16 16:26:15 +0000 UTC Tot
alObjects:10961 Size:11425383 CompactionLevel:1 Encoding:none IndexPageSize:0 TotalRecords:1 DataEncoding: BloomShardCount:1 FooterSize:19148 DedicatedColumns:[] ReplicationFactor:0}"
level=info ts=2024-09-20T13:52:06.777694147Z caller=compactor.go:250 msg="flushed to block" bytes=68436112 objects=16185 values=99991189
I wonder if there are some "sweet spot" settings with the compactor that would get me less failed compaction flushes. Maybe setting a low max_compaction objects
or max_block_bytes
would ensure that the multi-part upload will send chunks of the same length? I'll play around more with this settings.
parquet_row_group_size_bytes
is a target value. There's no way I'm aware of to enforce an exact row group size in parquet. Each row is added atomically and can't be split to force the row group to hit an exact byte count.
I wonder if there are some "sweet spot" settings with the compactor that would get me less failed compaction flushes.
Technically you can raise your row group size so that compactors/ingesters always flush parquet files with exactly one row group, but this will not work except for the smallest Tempo installs.
Got it. Interestingly enough, I set max_compaction_objects
to 10000, and that seems to set things in the right direction...
Got it. Interestingly enough, I set max_compaction_objects to 10000, and that seems to set things in the right direction...
You are probably restricting the compactors to only creating very small blocks. These blocks all have 1 row group so the multipart upload is succeeding.
Describe the bug About a week ago, our Tempo alerts for compaction failures started firing.
This is the query that we use. If it's above 1, the alerts fire.
I checked the compactor logs (we only have one compactor running), and the only suspicious thing that I've noticed is messages like this:
On top of that, I also noticed that the r2 bucket that we use for traces is ever increasing in size, which didn't happen before observing the compaction errors.
To Reproduce Steps to reproduce the behavior:
Expected behavior I expect that there would be no compaction errors
Environment:
Additional Context
I've looked at this runbook, and have tried increasing the memory of the compactor pod significantly and adjusting
compaction_window
to 30min. I've also tried adjustingmax_block_bytes
to 5G, but the problem still persist. Some other issues that I've come across are:https://github.com/grafana/tempo/issues/3529 https://github.com/grafana/tempo/issues/1774
After looking at those issues, i've since adjusted my bucket policies to delete objects one day after my tempo block retention period per @joe-elliott 's suggestion here.