grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.34k stars 3.38k forks source link

Grafana Loki Performance is very slow when loading logs more than 6 hours of data #8845

Open Manoharan-NMS opened 1 year ago

Manoharan-NMS commented 1 year ago

Execution steps Implementation architecture

Fluentd --> Loki --> Grafana

Loki Configuration

Path: /etc/loki/config.yml

auth_enabled: false

server:
#  http_listen_address: 127.0.0.1
  http_listen_port: 3100
#  grpc_listen_address: 127.0.0.1
  grpc_listen_port: 9096

# grpc_server_max_recv_msg_size: 200MB = 200*1024*1024 = 209715200
  grpc_server_max_recv_msg_size: 209715200
#  grpc_server_max_send_msg_size: 200MB = 200*1024*1024 = 209715200
  grpc_server_max_send_msg_size: 209715200

  http_server_read_timeout: 3m
  http_server_write_timeout: 3m

common:
  path_prefix: /home/loki_data/loki
  storage:
    filesystem:
      chunks_directory: /home/loki_data/loki/chunks
      rules_directory: /home/loki_data/loki/rules
  #replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

query_range:
  parallelise_shardable_queries: true
  cache_results: true
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2023-03-01
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: loki_index_
        period: 24h

ruler:
  storage:
    type: local
    local:
      directory: /home/loki_data/loki/rules
  rule_path: /home/loki_data/loki/rules-temp

  alertmanager_url: http://localhost:9093

limits_config:
  ingestion_rate_strategy: global
  #ingestion_rate_mb: 500
  ingestion_rate_mb: 1024
  #ingestion_burst_size_mb: 2000
  ingestion_burst_size_mb: 5000
  max_label_name_length: 1024
  #max_label_value_length: 2048
  max_label_value_length: 4096
  #max_label_names_per_series: 30
  max_label_names_per_series: 100
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  creation_grace_period: 10m
  enforce_metric_name: true
  max_line_size: 0
  max_line_size_truncate: false
  increment_duplicate_timestamp: false
  max_entries_limit_per_query: 50000
  #max_streams_per_user: 0
  max_streams_per_user: 500000
  #max_global_streams_per_user: 50000
  max_global_streams_per_user: 500000
  unordered_writes: true
  #max_chunks_per_query: 2000000
  max_chunks_per_query: 4000000
  max_query_length: 721h
  max_query_parallelism: 3500
  #max_query_series: 500
  max_query_series: 1000
  cardinality_limit: 100000
  max_streams_matchers_per_query: 10000
  #max_concurrent_tail_requests: 100
  max_concurrent_tail_requests: 200
  ruler_evaluation_delay_duration: 0s
  ruler_max_rules_per_rule_group: 0
  ruler_max_rule_groups_per_tenant: 0
  per_stream_rate_limit: 512MB
  per_stream_rate_limit_burst: 1024MB
  max_cache_freshness_per_query: '10m'
  #split_queries_by_interval: 24h
  #split_queries_by_interval: 15m
  #split_queries_by_interval: 120m
  split_queries_by_interval: 2h
  #tsdb_max_query_parallelism: 1024
  #max_queriers_per_tenant: 128
chunk_store_config:
  max_look_back_period: 336h
table_manager:
  retention_deletes_enabled: true
  #retention_period: 2190h
  retention_period: 336h
ingester:
  #chunk_idle_period: 15m
  #chunk_idle_period: 1h
  chunk_idle_period: 30m
  chunk_retain_period: 30s
  chunk_target_size: 1572864

  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        #store: inmemory
        store: memberlist
      replication_factor: 1
    #final_sleep: 0s
    final_sleep: 30s
  chunk_encoding: snappy

query_scheduler:
  max_outstanding_requests_per_tenant: 10000

frontend_worker:
  grpc_client_config:
    grpc_compression: snappy
    max_recv_msg_size: 1048576000
    max_send_msg_size: 1048576000
  parallelism: 24

Steps Maximum of 1GB per day logs coming from fluentd.

In Grafana selecting Time Range as 5 mins,15 mins,1 hour,6 hours,24 hours

Actual Result

Loading logs for 5mins,15mins is okay. When selecting logs 3 hours,12 hours,24 hours Grafana GUI taking around 1 to 2 min to load complete logs Expected from Loki

How to speed up loki performance when huge logs are processed.

Manoharan-NMS commented 1 year ago

Any update ?

schiorean commented 1 year ago

@Manoharan-NMS I have a similar issue. Standard Loki installed as a Docker container following the official documentation, and only a handful of log entries. The query takes a large amount of time if I go use say the last 30 days range. Even if there are only a couple of logs in the entire database (tested).

Did you find any solution?

rad-pat commented 12 months ago

@Manoharan-NMS I have a similar issue. Standard Loki installed as a Docker container following the official documentation, and only a handful of log entries. The query takes a large amount of time if I go use say the last 30 days range. Even if there are only a couple of logs in the entire database (tested).

Did you find any solution? I also have this same problem. A handful of logs in the data, but query for >10days range becomes unusable. Loki v 2.8.4

rad-pat commented 12 months ago

Actually, after further investigation, I had better performance setting split_queries_by_interval to 1d.