grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.55k stars 3.41k forks source link

Loki 3 Bloom compactor failed due to EntityTooLarge #13444

Open weiwu-sre opened 3 months ago

weiwu-sre commented 3 months ago

Describe the bug A clear and concise description of what the bug is.

Loki bloom compactor failed due to EntityTooLarge.

The error message looks like.

level=error ts=2024-07-04T09:27:29.117147298Z caller=controller.go:439 component=bloom-compactor org_id=api table=index_19906 ownership=0000000000000000-ffffffffffffffff gap=0000000000000000-ffffffffffffffff tsdb=1719973319-compactor-1719871157645-1719972448000-77abf360.tsdb msg="failed to write block" err="failed to put block file bloom/index_19906/api/blocks/00034667ad4d933a-00465ddb95f8955a/1719876551566-1719965392944-ce3c63c0.tar.gz: EntityTooLarge: Your proposed upload exceeds the maximum allowed size\n\tstatus code: 400

I am using S3 as storage backend, when I use following compactor configuration, the compactor failed and did not recover.

 structuredConfig:
            bloom_compactor:
              enabled: true
              # Interval at which to re-run the compaction operation.
              # default = 10m
              compaction_interval: 10m
              # Number of workers to run in parallel for compaction.
              # default = 1
              worker_parallelism: 3
              retention:
                enabled: true
                max_lookback_days: 30
            bloom_gateway:
              enabled: true
              worker_concurrency: 4
              block_query_concurrency: 8
              client:
                addresses: dnssrvnoa+_grpc._tcp.loki-scalable-bloom-gateway-headless.grafana-loki.svc.cluster.local

After inspect the files on the compactor. The bloom is about 600+M.

/ $ ls -alh /var/loki/blooms/bloom/index_19905/api/blocks/194ca84e71dbbb52-1a572b2786c37730/1719786877154-1719885609211-9a5fe5fd/
total 642M
drwxrwsr-x    2 loki     loki        4.0K Jul  3 20:14 .
drwxrwsr-x    3 loki     loki        4.0K Jul  3 20:14 ..
-rw-rw----    1 loki     loki      641.7M Jul  3 20:14 bloom
-rw-rw----    1 loki     loki       34.2K Jul  3 20:14 series

To Reproduce Steps to reproduce the behavior:

  1. Started Loki (SHA or version)
  2. Started Promtail (SHA or version) to tail '...'
  3. Query: {} term

Expected behavior A clear and concise description of what you expected to happen.

Compactor can upload large file to s3 backend.

Environment:

Loke 3.1 EKS v1.27

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.

chaudum commented 2 months ago

Hi @weiwu-sre

The EntityTooLarge error seems like a limitation of your S3 storage backend, see https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList

However, you can build smaller blocks by specifying these options in the limits_config (can also be set per-tenant):

# Experimental. The maximum bloom block size. A value of 0 sets an unlimited
# size. Default is 200MB. The actual block size might exceed this limit since
# blooms will be added to blocks until the block exceeds the maximum block size.
# CLI flag: -bloom-compactor.max-block-size
[bloom_compactor_max_block_size: <int> | default = 200MB]

# Experimental. The maximum bloom size per log stream. A log stream whose
# generated bloom filter exceeds this size will be discarded. A value of 0 sets
# an unlimited size. Default is 128MB.
# CLI flag: -bloom-compactor.max-bloom-size
[bloom_compactor_max_bloom_size: <int> | default = 128MB]

128MB for blooms and 200MB for blocks are default values. These were introduced at some point after 3.0.0 though. If you upgrade to the lastest version, you should not see such big blocks any more.

weiwu-sre commented 2 months ago

I am running grafana/loki:3.1.0 BTW.

And I can see those default values from the help command.

  -bloom-compactor.max-block-size value
        Experimental. The maximum bloom block size. A value of 0 sets an unlimited size. Default is 200MB. The actual block size might exceed this limit since blooms will be added to blocks until the block exceeds the maximum block size. (default 200MB)
  -bloom-compactor.max-bloom-size value
        Experimental. The maximum bloom size per log stream. A log stream whose generated bloom filter exceeds this size will be discarded. A value of 0 sets an unlimited size. Default is 128MB. (default 128MB)

Should I set those values to enforce the limit ?

emadolsky commented 1 month ago

Same issue

emadolsky commented 1 month ago

To provide more context, we observe that a lot of the bloom blocks of our big tenants are larger than the max-size (which is by default, 200MB). Though most of the blooms are built and uploaded anyways since they are smaller than 5GB, but a few are larger and it stops the whole compacting process.