grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
3.91k stars 510 forks source link

Tempo 2.6 - dedicated column issues #4087

Open edgarkz opened 6 days ago

edgarkz commented 6 days ago

Describe the bug Since latest tempo 2.6 setup and migration to vParquet4 - dedicated columns related outputs are unclear and i suspect they are not working well, some of blocks do have dedicated column output but others are not.

Below is my storage related configuration:


storage:
  trace:
    backend: s3
    block:
      parquet_dedicated_columns:
      - name: service.instance.id
        scope: resource
        type: string
      - name: service.version
        scope: resource
        type: string
      - name: telemetry.sdk.language
        scope: resource
        type: string
      - name: telemetry.sdk.version
        scope: resource
        type: string
      - name: telemetry.sdk.name
        scope: resource
        type: string
      - name: k8s.pod.ip
        scope: resource
        type: string
      - name: service.name
        scope: resource
        type: string
      - name: messaging.client_id
        scope: span
        type: string
      - name: xxxxxx.flow_id
        scope: span
        type: string
      - name: messaging.destination.name
        scope: span
        type: string
      - name: thread.name
        scope: span
        type: string
      - name: correlation_id
        scope: span
        type: string
      - name: xxxxxx.message.type
        scope: span
        type: string
      - name: xxxxxx.event.id
        scope: span
        type: string
      - name: xxxxxx.event.name
        scope: span
        type: string
      version: vParquet4
    blocklist_poll: 5m
    local:
      path: /var/tempo/traces
    pool:
      max_workers: 400
      queue_depth: 20000
    s3:
      bucket: xxxxxxxxxx
      endpoint: s3.us-east-1.amazonaws.com
    wal:
      path: /var/tempo/wal

Tempo cli doesnt show anymore if those top used attributes are within dedicated column (they were shown for sure in tempo 2.5) Image

Compactor logs:

`

level=info ts=2024-09-15T13:34:34.320309485Z caller=compactor.go:155 msg="Compacting hash" hashString=single-tenant-0-1918230-0
level=info ts=2024-09-15T13:34:34.32035673Z caller=compactor.go:186 msg="beginning compaction" traceID=199f4a7fc3e272f8
level=info ts=2024-09-15T13:34:34.320431619Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:85a4d9de-dd73-4cda-88a6-1e5c5e91dd53 TenantID:single-tenant StartTime:2024-09-15 13:18:13 +0000 UTC EndTime:2024-09-15 13:33:23 +0000 UTC TotalObjects:13103 Size:3204647 CompactionLevel:0 Encoding:none IndexPageSize:0 TotalRecords:1 DataEncoding: BloomShardCount:1 FooterSize:23331 DedicatedColumns:[{Scope:resource Name:service.instance.id Type:string} {Scope:resource Name:service.version Type:string} {Scope:resource Name:telemetry.sdk.language Type:string} {Scope:resource Name:telemetry.sdk.version Type:string} {Scope:resource Name:telemetry.sdk.name Type:string} {Scope:resource Name:k8s.pod.ip Type:string} {Scope:resource Name:service.name Type:string} {Scope:span Name:messaging.client_id Type:string} {Scope:span Name:xxxxxx.flow_id Type:string} {Scope:span Name:messaging.destination.name Type:string} {Scope:span Name:thread.name Type:string} {Scope:span Name:correlation_id Type:string} {Scope:span Name:xxxxxx.message.type Type:string} {Scope:span Name:xxxxxx.event.id Type:string} {Scope:span Name:xxxxxx.event.name Type:string}] ReplicationFactor:0}"
level=info ts=2024-09-15T13:34:34.373248301Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:f53aaec7-f229-4193-b4ee-7fe107aaddc0 TenantID:single-tenant StartTime:2024-09-15 13:16:02 +0000 UTC EndTime:2024-09-15 13:31:18 +0000 UTC TotalObjects:13456 Size:3321374 CompactionLevel:0 Encoding:none IndexPageSize:0 TotalRecords:1 DataEncoding: BloomShardCount:1 FooterSize:23344 DedicatedColumns:[{Scope:resource Name:service.instance.id Type:string} {Scope:resource Name:service.version Type:string} {Scope:resource Name:telemetry.sdk.language Type:string} {Scope:resource Name:telemetry.sdk.version Type:string} {Scope:resource Name:telemetry.sdk.name Type:string} {Scope:resource Name:k8s.pod.ip Type:string} {Scope:resource Name:service.name Type:string} {Scope:span Name:messaging.client_id Type:string} {Scope:span Name:xxxxxx.flow_id Type:string} {Scope:span Name:messaging.destination.name Type:string} {Scope:span Name:thread.name Type:string} {Scope:span Name:correlation_id Type:string} {Scope:span Name:xxxxxx.message.type Type:string} {Scope:span Name:xxxxxx.event.id Type:string} {Scope:span Name:xxxxxx.event.name Type:string}] ReplicationFactor:0}"
level=info ts=2024-09-15T13:34:39.60206792Z caller=compactor.go:264 msg="wrote compacted block" meta="&{Version:vParquet4 BlockID:d2106508-6ec3-46db-8674-7d47bba7fdb1 TenantID:single-tenant StartTime:2024-09-15 13:16:02 +0000 UTC EndTime:2024-09-15 13:33:23 +0000 UTC TotalObjects:15311 Size:3743444 CompactionLevel:1 Encoding:none IndexPageSize:0 TotalRecords:1 DataEncoding: BloomShardCount:1 FooterSize:23362 DedicatedColumns:[{Scope:resource Name:service.instance.id Type:string} {Scope:resource Name:service.version Type:string} {Scope:resource Name:telemetry.sdk.language Type:string} {Scope:resource Name:telemetry.sdk.version Type:string} {Scope:resource Name:telemetry.sdk.name Type:string} {Scope:resource Name:k8s.pod.ip Type:string} {Scope:resource Name:service.name Type:string} {Scope:span Name:messaging.client_id Type:string} {Scope:span Name:xxxxxx.flow_id Type:string} {Scope:span Name:messaging.destination.name Type:string} {Scope:span Name:thread.name Type:string} {Scope:span Name:correlation_id Type:string} {Scope:span Name:xxxxxx.message.type Type:string} {Scope:span Name:xxxxxx.event.id Type:string} {Scope:span Name:xxxxxx.event.name Type:string}] ReplicationFactor:0}"
level=info ts=2024-09-15T13:34:39.700161691Z caller=compactor.go:274 msg="compaction complete" elapsed=5.379742003s block="&{Version:vParquet4 BlockID:d2106508-6ec3-46db-8674-7d47bba7fdb1 TenantID:single-tenant StartTime:2024-09-15 13:16:02 +0000 UTC EndTime:2024-09-15 13:33:23 +0000 UTC TotalObjects:15311 Size:3743444 CompactionLevel:1 Encoding:none IndexPageSize:0 TotalRecords:1 DataEncoding: BloomShardCount:1 FooterSize:23362 DedicatedColumns:[{Scope:resource Name:service.instance.id Type:string} {Scope:resource Name:service.version Type:string} {Scope:resource Name:telemetry.sdk.language Type:string} {Scope:resource Name:telemetry.sdk.version Type:string} {Scope:resource Name:telemetry.sdk.name Type:string} {Scope:resource Name:k8s.pod.ip Type:string} {Scope:resource Name:service.name Type:string} {Scope:span Name:messaging.client_id Type:string} {Scope:span Name:xxxxxx.flow_id Type:string} {Scope:span Name:messaging.destination.name Type:string} {Scope:span Name:thread.name Type:string} {Scope:span Name:correlation_id Type:string} {Scope:span Name:xxxxxx.message.type Type:string} {Scope:span Name:xxxxxx.event.id Type:string} {Scope:span Name:xxxxxx.event.name Type:string}] ReplicationFactor:0}"
level=info ts=2024-09-15T13:34:39.700274971Z caller=compactor.go:155 msg="Compacting hash" hashString=single-tenant-0-1918230-1
level=info ts=2024-09-15T13:34:39.700314397Z caller=compactor.go:186 msg="beginning compaction" traceID=296bee8e0c2d66b4
level=info ts=2024-09-15T13:34:39.700367853Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:b1b648b2-48fc-46b4-8bac-d5b618f960eb TenantID:single-tenant StartTime:2024-09-15 13:29:38 +0000 UTC EndTime:2024-09-15 13:30:52 +0000 UTC TotalObjects:408 Size:134851 CompactionLevel:0 Encoding:none IndexPageSize:0 TotalRecords:1 DataEncoding: BloomShardCount:1 FooterSize:17923 DedicatedColumns:[] ReplicationFactor:1}"
level=info ts=2024-09-15T13:34:39.721291431Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:922ef914-d636-4394-86a6-a9381c8eb7ad TenantID:single-tenant StartTime:2024-09-15 13:32:20 +0000 UTC EndTime:2024-09-15 13:33:28 +0000 UTC TotalObjects:434 Size:157229 CompactionLevel:0 Encoding:none IndexPageSize:0 TotalRecords:1 DataEncoding: BloomShardCount:1 FooterSize:18033 DedicatedColumns:[] ReplicationFactor:1}"
level=info ts=2024-09-15T13:34:39.754336651Z caller=compactor.go:198 msg="compacting block" block="&{Version:vParquet4 BlockID:465420d8-9c4f-47f3-a688-ba3909ec61c1 TenantID:single-tenant StartTime:2024-09-15 13:27:39 +0000 UTC EndTime:2024-09-15 13:30:09 +0000 UTC TotalObjects:460 Size:166137 CompactionLevel:0 Encoding:none IndexPageSize:0 TotalRecords:1 DataEncoding: BloomShardCount:1 FooterSize:17955 DedicatedColumns:[] ReplicationFactor:1}
`

Environment:

Additional Context

mapno commented 5 days ago

Hi! How are you deploying Tempo, in distributed mode? Is it possible that the different blocks were being flushed by different ingesters? They are very different in size also (TotalObjects and Size). It seems that the config is not applied to the new ingesters

edgarkz commented 5 days ago

Hi Mario, 1# Indeed distributed mode, any reason ingesters ignore the config? There is no way only some picked the config since the chart itself restarts all the pods with new version upgrade and also i did some rolling restarts just to be on safe side.

2# all those rogue blocks have { ReplicationFactor:1}" while those with dedicated column - ReplicationFactor:0}" what does it mean the RF here?