grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.83k stars 3.43k forks source link

Cardinality limit exceeded; 13 unique label values #2863

Open gcotone opened 4 years ago

gcotone commented 4 years ago

Describe the bug We're getting the following error while querying for series older than X amount of time, possibly over midnight, in spite of having a very limited number of (unique) labels: Error doing request: Error response from server: cardinality limit exceeded for {}; 141435 entries, more than limit of 100000

To Reproduce Steps to reproduce the behavior:

  1. Started Loki 2.0.0
  2. Started Promtail 2.0.0-amd64
  3. Query: {job="fw-log"}

Expected behavior If the number of unique labels remain constant over time, series should be returned

Environment:

Screenshots, Promtail config, or terminal output

Grafana query:

{
  "request": {
    "url": "api/datasources/proxy/176/loki/api/v1/query_range?direction=BACKWARD&limit=1860&query=%7Bjob%3D%22fw-log%22%7D&start=1604275059000000000&end=1604278660000000000&step=2",
    "hideFromInspector": false
  },
  "response": {
    "error": "",
    "response": "cardinality limit exceeded for {}; 141435 entries, more than limit of 100000\n",
    "message": "cardinality limit exceeded for {}; 141435 entries, more than limit of 100000\n"
  }
}

logcli

LALALA#logcli series '{job="fw-log"}' --analyze-labels --since=6h                                                                                                                      [0] 20-11-02 9:41:11
https://localhost/loki/api/v1/series?end=1604306474033385197&match=%7Bjob%3D%22fw-log%22%7D&start=1604284874033385197
Total Streams:  10
Unique Labels:  5

Label Name   Unique Values  Found In Streams
lvl          5              10
application  4              10
host         2              10
facility     1              10
job          1              10
LALALA#logcli series '{job="fw-log"}' --analyze-labels --since=7h                                                                                                                      [0] 20-11-02 9:41:14
https://localhost/loki/api/v1/series?end=1604306477218995589&match=%7Bjob%3D%22fw-log%22%7D&start=1604281277218995589
Total Streams:  11
Unique Labels:  5

Label Name   Unique Values  Found In Streams
lvl          5              11
application  4              11
host         2              11
facility     1              11
job          1              11
LALALA#logcli series '{job="fw-log"}' --analyze-labels --since=8h                                                                                                                      [0] 20-11-02 9:41:17
https://localhost/loki/api/v1/series?end=1604306480489553537&match=%7Bjob%3D%22fw-log%22%7D&start=1604277680489553537
Total Streams:  11
Unique Labels:  5

Label Name   Unique Values  Found In Streams
lvl          5              11
application  4              11
host         2              11
facility     1              11
job          1              11
LALALA#logcli series '{job="fw-log"}' --analyze-labels --since=9h                                                                                                                      [0] 20-11-02 9:41:21
https://localhost/loki/api/v1/series?end=1604306485172362222&match=%7Bjob%3D%22fw-log%22%7D&start=1604274085172362222
Error doing request: Error response from server: cardinality limit exceeded for {}; 141435 entries, more than limit of 100000
 (<nil>)

LALALA#logcli series '{job="fw-log"}' --analyze-labels --since=9h                                                                                                                            [0] 20-11-02 10:00:38
https://localhost/loki/api/v1/series?end=1604307698329025099&match=%7Bjob%3D%22fw-log%22%7D&start=1604275298329025099
Total Streams:  11
Unique Labels:  5

Label Name   Unique Values  Found In Streams
lvl          5              11
application  4              11
host         2              11
facility     1              11
job          1              11
LALALA#logcli series '{job="fw-log"}' --analyze-labels --since=10h                                                                                                                           [0] 20-11-02 10:01:39
https://localhost/loki/api/v1/series?end=1604307701929571537&match=%7Bjob%3D%22fw-log%22%7D&start=1604271701929571537
Error doing request: Error response from server: cardinality limit exceeded for {}; 141435 entries, more than limit of 100000
 (<nil>)
LALALA#  
cyriltovena commented 4 years ago

Most likely a vendor update causes this. Adding to 2.1.

slim-bean commented 4 years ago

@gcotone can you run the analyze labels query over all series?

logcli series '{}' --analyze-labels --since=6h

Also can you include your Loki config?

Thanks!

gcotone commented 4 years ago

@slim-bean now I get the error for all queries:

#logcli series '{}' --analyze-labels --since=6h                                                                                                                                       [0] 20-11-02 14:23:55
https://localhost/loki/api/v1/series?end=1604323442475853687&match=%7B%7D&start=1604301842475853687
Error doing request: Error response from server: cardinality limit exceeded for {}; 101515 entries, more than limit of 100000
 (<nil>)
#logcli series '{}' --analyze-labels --since=1m                                                                                                                                       [1] 20-11-02 14:24:03
https://localhost/loki/api/v1/series?end=1604323447135243178&match=%7B%7D&start=1604323387135243178
Error doing request: Error response from server: cardinality limit exceeded for {}; 101515 entries, more than limit of 100000
 (<nil>)
#logcli series '{job="fw-log"}' --analyze-labels --since=1m                                                                                                                         [130] 20-11-02 14:26:09
https://localhost/loki/api/v1/series?end=1604323576661875929&match=%7Bjob%3D%22fw-log%22%7D&start=1604323516661875929
Error doing request: Error response from server: cardinality limit exceeded for {}; 101515 entries, more than limit of 100000
 (<nil>)
#logcli series '{job="fw-log"}' --analyze-labels --since=6h                                                                                                                           [1] 20-11-02 14:26:17
https://localhost/loki/api/v1/series?end=1604323581387056050&match=%7Bjob%3D%22fw-log%22%7D&start=1604301981387056050
Error doing request: Error response from server: cardinality limit exceeded for {}; 101515 entries, more than limit of 100000
 (<nil>)

Here's my config:

kind: ConfigMap
metadata:
  name: loki-config
  namespace: default
apiVersion: v1
data:
  loki.yaml: |-
    # Disable multi-tenancy
    auth_enabled: false

    # Storage config
    storage_config:
      aws:
        s3: s3://eu-central-1/app-loki-s3
        dynamodb:
          dynamodb_url: dynamodb://eu-central-1
      boltdb_shipper:
        active_index_directory: /loki/index
        cache_location: /loki/boltdb-cache
        cache_ttl: 4h         # Can be increased for faster performance over longer query periods, uses more disk space
        shared_store: s3

    # Schema Config
    schema_config:
      configs:
      - from: 2020-05-15
        store: aws
        object_store: s3
        schema: v11
        index:
          prefix: loki_index_
          period: 24h
          tags:
            application: app
            component: loki
      - from: 2020-10-29
        store: boltdb-shipper
        object_store: s3
        schema: v11
        index:
          prefix: loki_index_
          period: 24h
    # The module to run Loki with. Supported values
    # all, distributor, ingester, querier, query-frontend, table-manager.
    server:
      http_listen_port: 8080
      grpc_listen_port: 9095
      graceful_shutdown_timeout: 5s
      grpc_server_max_recv_msg_size: 67108864
      http_server_idle_timeout: 120s

    # Configures how the lifecycle of the ingester will operate
    # and where it will register for discovery
    ingester:
      lifecycler:
        #address: 0.0.0.0
        ring:
          kvstore:
            store: memberlist
          replication_factor: 2
        final_sleep: 0s
      chunk_idle_period: 5m
      chunk_retain_period: 30s

    # Table Manager configuration
    table_manager:
      retention_period: 48h
      retention_deletes_enabled: true
      index_tables_provisioning:
        enable_ondemand_throughput_mode: true

    chunk_store_config:
      max_look_back_period: 0

    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 168h

    memberlist:
      abort_if_cluster_join_fails: false

      # Expose this port on all distributor, ingester
      # and querier replicas.
      bind_port: 7946

      # You can use a headless k8s service for all distributor,
      # ingester and querier components.
      join_members:
      - loki-gossip-ring.default.svc.cluster.local:7946

      max_join_backoff: 1m
      max_join_retries: 10
      min_join_backoff: 1s

    compactor:
      working_directory: /loki/compactor
      shared_store: aws
slim-bean commented 4 years ago

You can increase this limit to at least allow you to run these queries with cardinality_limit in the limits_config section

gcotone commented 4 years ago

After bumping the limit to 500000, it now returns without error. However, it's not clear to me why cardinality increases if the set of label/values remain constant over time. Here are the values for all series in the last 24-48hs:

#logcli series '{}' --analyze-labels --since=24h                                                                                                                                      [0] 20-11-02 17:22:06
https://localhost/loki/api/v1/series?end=1604334127937644445&match=%7B%7D&start=1604247727937644445
Total Streams:  11
Unique Labels:  5

Label Name   Unique Values  Found In Streams
lvl          5              11
application  4              11
host         2              11
facility     1              11
job          1              11
#logcli series '{}' --analyze-labels --since=48h                                                                                                                                      [0] 20-11-02 17:22:10
https://localhost/loki/api/v1/series?end=1604334405699267409&match=%7B%7D&start=1604161605699267409
Total Streams:  11
Unique Labels:  5

Label Name   Unique Values  Found In Streams
lvl          5              11
application  4              11
host         2              11
facility     1              11
job          1              11
#                                               
sandeepsukhani commented 3 years ago

The limit here is related to number of index entries a query fetches from index store, considering the number of labels and streams it looks strange that the number for you is so high. If you are scraping loki metrics then can you please share the graph for last 24h for following metric query: sum(rate(loki_ingester_chunks_flushed_total[1m]))

gcotone commented 3 years ago

@sandeepsukhani I only have values for 8AM - 8PM time range, and this is how it looks for sum(rate(loki_ingester_chunks_flushed_total[1m]))

Screenshot from 2020-11-23 11-21-40

glebsam commented 3 years ago

Looks like I step into the same issue. Below is my graph for sum(rate(loki_ingester_chunks_flushed_total[1m])). It seems troubles starts feeling after querying with specific amount of time back (with precision to seconds).

❯ while true; do logcli --addr http://127.0.0.1:3100 series '{ecs_cluster="XXX", ecs_container_name="YYY"}' --stats --since 10h1m10s --analyze-labels; sleep 1; done
http://127.0.0.1:3100/loki/api/v1/series?end=1607940065214758000&match=%7Becs_cluster%3D%22XXX%22%2C+ecs_container_name%3D%22YYY%22%7D&start=1607903995214758000
Error doing request: Error response from server: cardinality limit exceeded for {}; 131253 entries, more than limit of 100000
 (<nil>)
http://127.0.0.1:3100/loki/api/v1/series?end=1607940070964981000&match=%7Becs_cluster%3D%22XXX%22%2C+ecs_container_name%3D%22YYY%22%7D&start=1607904000964981000
Total Streams:  1
Unique Labels:  8

Label Name                   Unique Values  Found In Streams
ecs_task_definition_version  1              1
host                         1              1
image_id                     1              1
image_name                   1              1
source                       1              1
ecs_cluster                  1              1
ecs_container_name           1              1
ecs_task_definition_family   1              1

Increasing limit to 135000 did not helped, and error message now seems incorrect (or limit did not applied).

131253 entries, more than limit of 100000

Increasing limit to 500000 did not helped either, message is the same. The limit was set in loki config:

limits_config:
  cardinality_limit: 500000

image

image

slim-bean commented 3 years ago

@glebsam huh, thanks for reporting, and all the info you provided is great.

This is very peculiar, we haven't been able to reproduce this and are still not sure what's happening here.

Can you describe your deployment a little more, i see the two labels you list, but how many clusters and containers do you have?

Are you using boltdb-shipper? and if your log data isn't sensitive would you be willing to send us some files so we could try to run this locally?

@gcotone same question for you, would you be able to send us some files so we can try to recreate this locally?

glebsam commented 3 years ago

@slim-bean it is about 84 containers (26 hosts) sending its logs to Loki. Senders are Loki logging drivers. Loki deployed in monolithic mode in docker. Index storage is AWS Dynamodb, chunks storage is AWS S3. Loki server instance has 4 Gb RAM, 3200 Mb of which is available to Loki container. Unfortunately, it's impossible to send log files itself, but you can ask me questions and I will do my best answering.

Also, I am sorry, in my previous message I made a mistake trying to increase cardinality (I touched a wrong config). Now with cardinality increased in proper way (500k), it allows me to perform the series command for up to last 24h (previously it was 10h) without any RAM or CPU penalty:

❯ logcli --addr http://127.0.0.1:3100 series '{ecs_cluster="XXX", ecs_container_name="YYY"}' --stats --since 10h --analyze-labels
http://127.0.0.1:3100/loki/api/v1/series?end=1607949539128521000&match=%7Becs_cluster%3D%22XXX%22%2C+ecs_container_name%3D%22YYY%22%7D&start=1607913539128521000
Total Streams:  4
Unique Labels:  8

Label Name                   Unique Values  Found In Streams
source                       2              4
ecs_task_definition_version  2              4
image_id                     2              4
image_name                   2              4
ecs_task_definition_family   1              4
host                         1              4
ecs_container_name           1              4
ecs_cluster                  1              4

Also, (it may be related, may be not): a couple of days ago the load on loki application was increased dramatically (see the graph below), it caused loki restarts due to OOM (we've seen DynamoDB throttling and increased RAM consumption from Loki).

image

Cluster-wide cardinality for last 10 minutes:

❯ logcli --addr http://127.0.0.1:3100 series '{ecs_cluster=~".+"}' --stats --since 10m --analyze-labels
http://127.0.0.1:3100/loki/api/v1/series?end=1607947526420783000&match=%7Becs_cluster%3D~%22.%2B%22%7D&start=1607946926420783000
Total Streams:  115
Unique Labels:  8

Label Name                   Unique Values  Found In Streams
ecs_task_definition_family   84             115
ecs_container_name           66             115
ecs_task_definition_version  57             115
host                         26             115
image_name                   19             115
image_id                     18             115
ecs_cluster                  11             115
source                       2              115

Loki config:

auth_enabled: false

server:
  http_listen_port: 3100
  http_server_read_timeout: 4m
  http_server_write_timeout: 4m
  http_server_idle_timeout: 4m

ingester:
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
    final_sleep: 10s
  chunk_idle_period: 3m
  chunk_retain_period: 3m
  chunk_encoding: lz4

distributor:
    ring:
      kvstore:
        store: memberlist

querier:
  query_timeout: 5m
  engine:
    timeout: 4m

schema_config:
  configs:
  - from: 2020-09-28
    store: aws
    object_store: aws
    schema: v11
    index:
      prefix: loki-index_
      period: 7d

query_range:
  split_queries_by_interval: 24h

storage_config:
  aws:
    s3: s3://eu-central-1
    bucketnames: loki-chunks-
    dynamodb:
      dynamodb_url: dynamodb://eu-central-1

limits_config:
  ingestion_rate_strategy: local  # per-replica limits, not overall cluster
  enforce_metric_name: false  #  "metrics" are always logs, so do not need any names, only tags
  reject_old_samples: true  # reject samples older than X age
  reject_old_samples_max_age: 168h  # reject samples older than 7 days
  max_entries_limit_per_query: 20_000  # Maximum number of log entries that will be returned for a query. 0 to disable
  ingestion_rate_mb: 20  # Per-user ingestion rate limit in sample size per second
  ingestion_burst_size_mb: 30  # Per-user allowed ingestion burst size (in sample size)
  cardinality_limit: 500000

table_manager:
  retention_deletes_enabled: true
  retention_period: 35d
  index_tables_provisioning:
    provisioned_write_throughput: 150
    provisioned_read_throughput: 150
    inactive_write_throughput: 5
    inactive_read_throughput: 30
slim-bean commented 3 years ago

Great thanks for more info! Just to rule out there isn't a bug in our analyze labels script what do you get if you run:

logcli --addr http://127.0.0.1:3100 series '{}' --since 10m | wc -l
glebsam commented 3 years ago
❯ logcli --addr http://127.0.0.1:3100 series '{}' --since 10m | wc -l
http://127.0.0.1:3100/loki/api/v1/series?end=1607956971903130000&match=%7B%7D&start=1607956371903130000
     108
slim-bean commented 3 years ago

huh, nothing crazy there.

what about for a longer period?

logcli --addr http://127.0.0.1:3100 series '{}' --since 24h | wc -l
glebsam commented 3 years ago
❯ logcli --addr http://127.0.0.1:3100 series '{}' --since 24h | wc -l
http://127.0.0.1:3100/loki/api/v1/series?end=1607963319020218000&match=%7B%7D&start=1607876919020218000
     326
glebsam commented 3 years ago

JFYI, still having the issue with version 2.1.0 The workaround 🌟 is also the same, set higher cardinality limit (500k):

limits_config:
  cardinality_limit: 500_000
stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

glebsam commented 3 years ago

Dear Stale Bot, let's keep this issue open) I believe, even with existing workaround it may confuse a new user of such a great product.

owen-d commented 3 years ago

Credit: @cyriltovena. Could this be due to __name__="logs" which is forced via our usage of the Cortex index?

dannykopping commented 3 years ago

@cyriltovena do you have an answer to the above question?

cyriltovena commented 3 years ago

I don't think so.

pgassmann commented 2 years ago

Is there a way to get the cardinality (number of streams) as a metric from loki?

I would like to visualize them and define alerts based on the amount of streams. (e.g. how many containers send logs per host and alert if that number changes significantly)

❯ logcli --addr http://127.0.0.1:3100 series '{ecs_cluster=~".+"}' --stats --since 10m --analyze-labels
http://127.0.0.1:3100/loki/api/v1/series?end=1607947526420783000&match=%7Becs_cluster%3D~%22.%2B%22%7D&start=1607946926420783000
Total Streams:  115
Unique Labels:  8

Label Name                   Unique Values  Found In Streams
ecs_task_definition_family   84             115
ecs_container_name           66             115
ecs_task_definition_version  57             115
host                         26             115
image_name                   19             115
image_id                     18             115
ecs_cluster                  11             115
source                       2              115
kd7lxl commented 1 year ago

We see a similar (same?) issue. It complains about cardinality limit exceeded, but label cardinality is quite low. Volume is high. We ingest around 30MB/s, with at least 10MB/s coming from a single stream (like I said, label cardinality is very low).

This was the original error:

% logcli series --analyze-labels '{namespace_name="ib-system"}' --since 1m --stats
2023/03/17 15:30:36 http://localhost:3100/loki/api/v1/series?end=1679092236439105000&match=%7Bnamespace_name%3D%22ib-system%22%7D&start=1679092176439105000
2023/03/17 15:30:37 Error response from server: cardinality limit exceeded for {}; 168950 entries, more than limit of 100000
 (<nil>) attempts remaining: 0
2023/03/17 15:30:37 Error doing request: Run out of attempts while querying the server; response: cardinality limit exceeded for {}; 168950 entries, more than limit of 100000

After increasing cardinality_limit, the same query returns successfully. 3 streams.

% logcli series --analyze-labels '{namespace_name="ib-system"}' --since 1m --stats
2023/03/17 16:08:19 http://localhost:3100/loki/api/v1/series?end=1679094499804626000&match=%7Bnamespace_name%3D%22ib-system%22%7D&start=1679094439804626000
Total Streams:  3
Unique Labels:  4

Label Name      Unique Values  Found In Streams
container_name  3              3
job             1              3
cluster         1              3
namespace_name  1              3

Querying everything in Loki, we find only 459 streams:

% logcli series --analyze-labels '{}' --stats
2023/03/17 16:10:06 http://localhost:3100/loki/api/v1/series?end=1679094606348617000&match=%7B%7D&start=1679091006348617000
Total Streams:  459
Unique Labels:  4

Label Name      Unique Values  Found In Streams
container_name  396            457
namespace_name  118            457
job             1              459
cluster         1              456

Having read the docs, it's not immediately clear to me why measured cardinality and total streams differ. Is it counting chunks toward the cardinality limit?

adapasuresh commented 1 year ago

I have similar issue in promtail-2.1.0 I can see labels in http://local host:9080/targets and servicediscovery

And also in stages (stage-0, stage-1)

But, not in "wal" files and in grafana data source-> explorer

diranged commented 1 year ago

Hi, we're running into this issue now in our largest cluster (~500-800 nodes at any given time, we boot maybe ~2000-3000 nodes per day). Link to small slack thread I started.

In our case, our developers were trying to follow logs on a single host:

Query: {node_name="ip-100-64-164-80.us-west-2.compute.internal", namespace="xxx", container!="xxx"} Error: cardinality limit exceeded for logs{node_name}; 115329 entries, more than limit of 100000

I don't understand why cardinality applies here, given that we've scoped the query down to a single node.

Loki Config

apiVersion: v1
data:
  config.yaml: |
    analytics:
      reporting_enabled: false
    auth_enabled: false
    common:
      compactor_address: http://loki-compactor:3100
      replication_factor: 3
    compactor:
      retention_enabled: true
      shared_store: s3
    distributor:
      ring:
        heartbeat_timeout: 15s
        kvstore:
          store: memberlist
    frontend:
      compress_responses: true
      log_queries_longer_than: 15s
    frontend_worker:
      frontend_address: 'loki-query-frontend:9095'
      grpc_client_config:
        grpc_compression: gzip
        max_send_msg_size: 134217728
    ingester:
      autoforget_unhealthy: true
      chunk_idle_period: 2h
      chunk_target_size: 1536000
      flush_op_timeout: 600s
      lifecycler:
        join_after: 5s
        ring:
          heartbeat_timeout: 15s
          kvstore:
            store: memberlist
      max_chunk_age: 1h
      max_transfer_retries: 0
      query_store_max_look_back_period: 0
      wal:
        enabled: false
    ingester_client:
      grpc_client_config:
        grpc_compression: gzip
        max_send_msg_size: 134217728
    limits_config:
      ingestion_burst_size_mb: 80
      ingestion_rate_mb: 50
      ingestion_rate_strategy: local
      max_cache_freshness_per_query: 10m
      max_concurrent_tail_requests: 50
      max_entries_limit_per_query: 10000
      max_global_streams_per_user: 0
      max_label_name_length: 128
      max_label_value_length: 1024
      max_line_size: 256kb
      max_line_size_truncate: true
      max_streams_per_user: 0
      per_stream_rate_limit: 20M
      per_stream_rate_limit_burst: 40M
      query_timeout: 30m
      reject_old_samples: true
      reject_old_samples_max_age: 1h
      retention_period: 720h
      split_queries_by_interval: 15m
    memberlist:
      join_members:
      - dnssrv+_tcp._tcp.loki-memberlist.observability.svc.cluster.local.
      left_ingesters_timeout: 30s
    querier:
      engine:
        timeout: 5m
      multi_tenant_queries_enabled: true
      query_ingesters_within: 45m
    query_range:
      align_queries_with_step: true
      cache_results: false
      max_retries: 5
      parallelise_shardable_queries: true
    runtime_config:
      file: /var/loki-runtime/runtime.yaml
    schema_config:
      configs:
      - from: "2021-01-20"
        index:
          period: 24h
          prefix: index_
        object_store: s3
        schema: v11
        store: boltdb-shipper
    server:
      grpc_server_max_recv_msg_size: 134217728
      http_listen_port: 3100
      http_server_idle_timeout: 1800s
      http_server_read_timeout: 1800s
      http_server_write_timeout: 1800s
      http_tls_config:
        cert_file: /tls/tls.crt
        client_auth_type: VerifyClientCertIfGiven
        client_ca_file: /tls/ca.crt
        key_file: /tls/tls.key
      log_level: info
    storage_config:
      aws:
        s3: s3://us-west-2/xxx
        s3forcepathstyle: true
      boltdb_shipper:
        active_index_directory: /var/loki/index
        cache_location: /var/loki/boltdb-cache
        cache_ttl: 24h
        index_gateway_client:
          server_address: dns:///loki-index-gateway:9095
        resync_interval: 5m
        shared_store: s3
    table_manager:
      retention_deletes_enabled: false
      retention_period: 0s

Temporary Fix - scaling up cardinality_limit

Yes, setting cardinality_limit: 500000 does resolve the issue... for now. I don't understand though why this is necessary based on our query. Can someone explain why we would see a high cardinality error on node_name when we're specifically searching for logs with only one node_name label?

Versions

Loki: 2.8.2 Kubernetes: 1.24

Metrics

Here are a few of the metric values from our cluster based on the queries that were asked for by @slim-bean:

$ logcli --tls-skip-verify --addr https://127.0.0.1:3100 series '{}' --since 10m | wc -l
2023/05/26 06:42:22 https://127.0.0.1:3100/loki/api/v1/series?end=1685108542732040000&match=%7B%7D&start=1685107942732040000
   20778
$ logcli --tls-skip-verify --addr https://127.0.0.1:3100 series '{node_name="ip-100-64-164-80.us-west-2.compute.internal"}' --since 24h | wc -l 
2023/05/26 06:44:14 https://127.0.0.1:3100/loki/api/v1/series?end=1685108654241121000&match=%7Bnode_name%3D%22ip-100-64-164-80.us-west-2.compute.internal%22%7D&start=1685022254241121000
    1162
% logcli  --tls-skip-verify --addr https://127.0.0.1:3100 series --analyze-labels '{}' --stats
2023/05/26 06:46:45 https://127.0.0.1:3100/loki/api/v1/series?end=1685108805815421000&match=%7B%7D&start=1685105205815421000
Total Streams:  58058
Unique Labels:  16

Label Name        Unique Values  Found In Streams
pod               12444          49945
node_name         481            49945
scrape_pod        481            58058
hostname          474            8113
container         143            49945
job               127            49945
app               114            49945
instance          81             47627
namespace         48             49945
version           44             43263
component         39             3132
program           36             8113
level             5              7641
stream            2              49945
scrape_job        2              58058
scrape_namespace  1              58058
diranged commented 1 year ago

Here's a snapshot of 24-hrs of our labels:

% logcli  --tls-skip-verify --addr https://127.0.0.1:3100 series --analyze-labels '{}' --stats --since 24h
2023/05/26 09:36:12 https://127.0.0.1:3100/loki/api/v1/series?end=1685118972231879000&match=%7B%7D&start=1685032572231879000
Total Streams:  1594960
Unique Labels:  16

Label Name        Unique Values  Found In Streams
pod               227439         1458796
scrape_pod        3028           1594960
node_name         2930           1458796
hostname          2929           136164
job               254            1458796
app               240            1458796
container         170            1458796
instance          100            1431756
version           73             1383366
namespace         50             1458796
component         42             36996
program           38             136164
level             5              133136
stream            2              1458796
scrape_job        2              1594960
scrape_namespace  1              1594960
github-vincent-miszczak commented 1 year ago

Having the same issue with 2.8.1.

When having a query_range with start and end parameters that span 2 seconds, this query {hostname="xxx"} fails with cardinality limit exceeded for logs{hostname}; 118862 entries, more than limit of 100000.

Requesting for this label values in the same time range fails (timeout).

After applying the proposed workaround (raising cardinality_limit), the queries return results. The cardinality is in fact ~2000.

feldentm-SAP commented 1 year ago

Is someone working on this issue?