grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
24.04k stars 3.47k forks source link

how to use filesystem as chunk cache? #10257

Open lovesharepc opened 1 year ago

lovesharepc commented 1 year ago

I am use S3 as chunk storage does loki can do this feature? question 1. querier cache chunk to local filesystem ?

I see storage_config - tsdb_shipper have cache_location option but this look like index only

in this document https://grafana.com/docs/loki/latest/storage/ I see last example have filesystem config does this mean use filesystem sort-term chunk storage and ttl is 24H?

image

here is my full config

auth_enabled: true
ballast_bytes: 536870912 #512MB

server:
  http_listen_address: 0.0.0.0
  grpc_listen_address: 0.0.0.0
  http_listen_port: 8096
  grpc_listen_port: 9096
  log_level: warn
  http_server_read_timeout: 300s
  http_server_write_timeout: 300s
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600

common:
  path_prefix: /loki
  storage:
    s3:
      endpoint: ${LOKI_ENDPOINT}
      access_key_id: ${LOKI_ACCESS_KEY_ID}
      secret_access_key: ${LOKI_SECRET_ACCESS_KEY}
      insecure: true
      bucketnames: ${LOKI_BUCKETNAMES}
      s3forcepathstyle: true
      region: ${LOKI_REGION}
  compactor_grpc_address: loki-write-1:9096
  ring:
    kvstore:
      store: memberlist

memberlist:
  join_members: 
    - loki-write-1:7946
    - loki-write-2:7946
    - loki-write-3:7946
    - loki-read-1:7946
    - loki-read-2:7946
    - loki-read-3:7946  
  dead_node_reclaim_time: 30s
  gossip_to_dead_nodes_time: 15s
  left_ingesters_timeout: 30s
  bind_addr: ['0.0.0.0']
  bind_port: 7946
  gossip_interval: 2s

ingester:
  lifecycler:
    join_after: 10s
    observe_period: 5s
    ring:
      replication_factor: 1
      kvstore:
        store: memberlist
    final_sleep: 0s
  chunk_idle_period: 1m
  wal:
    enabled: true
    dir: /loki/wal
    flush_on_shutdown: true
    replay_memory_ceiling: 104857600 # 100MB
  max_chunk_age: 2h
  chunk_retain_period: 60s
  chunk_encoding: gzip  
  chunk_target_size: 20e+06
  chunk_block_size: 1e+06
  flush_op_timeout: 10s
  concurrent_flushes: 8

schema_config:
  configs:
  - from: 2023-07-11
    store: tsdb
    object_store: s3
    schema: v12
    index:
      prefix: index_
      period: 24h

limits_config:
  enforce_metric_name: true
  reject_old_samples: false
  reject_old_samples_max_age: 4w
  ingestion_rate_strategy: "local"
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 10
  # parallelize queries in 15min intervals
  split_queries_by_interval: 15m
  per_stream_rate_limit: 50MB
  per_stream_rate_limit_burst: 50MB
  retention_period: ${LOKI_RETENTION_PERIOD}
  max_line_size: 256kb
  increment_duplicate_timestamp: true
  max_global_streams_per_user: 0
  shard_streams:
    enabled: false
    logging_enabled: true
    desired_rate: 4MB
  max_query_length: 30d1h
  query_timeout: 5m
  max_query_lookback: 0s
  max_chunks_per_query: 2000000
  max_query_parallelism: 2
  tsdb_max_query_parallelism: 4
  max_cache_freshness_per_query: 10m # because ALB log
  max_entries_limit_per_query: 5000

query_range:
  # make queries more cache-able by aligning them with their step intervals
  align_queries_with_step: true
  max_retries: 1
  parallelise_shardable_queries: true
  cache_results: true
  results_cache:
    compression: snappy
    cache:
      memcached_client:
        addresses: memcached:11211
        timeout: 100ms
        max_item_size: 104857600
        update_interval: 1m
      fifocache:
        max_size_bytes: 100MB
        ttl: 1h0m0s

storage_config:
  index_queries_cache_config:
    memcached_client:
      addresses: memcached:11211
      timeout: 100ms
      max_item_size: 104857600
      update_interval: 1m
    fifocache:
      max_size_bytes: 100MB
      ttl: 1h0m0s
  max_parallel_get_chunk: 20
  max_chunk_batch_size: 5
  tsdb_shipper:
    active_index_directory: "/loki/tsdb-shipper-active"
    shared_store: s3
    cache_location: "/loki/tsdb-shipper-cache"
    cache_ttl: 24h
  filesystem:  # test
    directory: /loki/loki/chunks

chunk_store_config:
  chunk_cache_config:
    memcached_client:
      addresses: memcached:11211
      timeout: 100ms
      max_item_size: 104857600
      update_interval: 1m
    fifocache:
      max_size_bytes: 100MB
      ttl: 1h0m0s
  write_dedupe_cache_config:
    memcached_client:
      addresses: memcached:11211
      timeout: 100ms
      max_item_size: 104857600
      update_interval: 1m
    fifocache:
      max_size_bytes: 100MB
      ttl: 1h0m0s

querier:
  max_concurrent: 4

frontend_worker:
  frontend_address: "loki-front:9096"
  parallelism: 8

# analytics:
#  reporting_enabled: false

question 2. short-term data in ingester and long-term data to S3, how to define period is short-term? does this right ? change ingester - max_chunk_age and increse querier - query_ingesters_within to control sort-term data query from filesystem

whiteadam commented 1 year ago
  1. cache_ttl in the screenshot is for the Index, not chunks. The filesystem there is chunk storage, not cache.

From what I can tell from the code, the only options for Chunk Cache are In Memory (Embedded), Memcache, and Redis. https://github.com/grafana/loki/tree/main/pkg/storage/chunk/cache

The in-memory cache is enabled by default now and there is info on adjusting the size here: https://grafana.com/docs/loki/latest/setup/upgrade/#in-memory-fifo-caches-enabled-by-default

  1. Yes, I think that is correct. max_chunk_age: 2h and query_ingesters_within: 2h
  2. I am not an expert, just a person trying to also figure this out :)
ashikhman commented 8 months ago

I just found a good article from Grafana Blog regarding using filesystem cache for chunks. You can setup memcached to use filesystem (extstore). Quote from the article:

extstore is quite simple, conceptually: Items that cannot fit into the LRU (in RAM) are simply transitioned to disk. In essence, extstore keeps all of its keys in RAM, and the values are split between RAM and disk.

I haven't tested it yet, but it looks promising.

https://grafana.com/blog/2023/08/23/how-we-scaled-grafana-cloud-logs-memcached-cluster-to-50tb-and-improved-reliability/