grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.63k stars 3.41k forks source link

How to integrate Loki with Cassandra Database for High Availability #9127

Closed Manoharan-NMS closed 7 months ago

Manoharan-NMS commented 1 year ago

How to integrate Loki with Cassandra Database for High Availability.

Can u suggest the proper configuration for that.

My Loki Version 2.8.0 Cassandra Version: 4.0.8

File: /etc/loki/config.yml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  grpc_server_max_recv_msg_size: 209715200
  grpc_server_max_send_msg_size: 209715200

  http_server_read_timeout: 3m
  http_server_write_timeout: 3m

common:
  instance_addr: 127.0.0.1
  path_prefix: /data/loki_data/loki_cassandra
  storage:
    filesystem:
      chunks_directory: /data/loki_data/loki_cassandra/chunks
      rules_directory: /data/loki_data/loki_cassandra/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  parallelise_shardable_queries: true
  align_queries_with_step: true
  max_retries: 5
  cache_results: true
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2023-04-13
      store: cassandra
      object_store: cassandra
      schema: v10
      index:
        prefix: loki_index
        #period: 168h
        period: 360h
      chunks:
        prefix: chunk
        #period: 168h
        period: 360h

storage_config:
  cassandra:
    addresses: 127.0.0.1
    replication_factor: 1
    auth: true
    keyspace: loki
    username: cassandra
    password: cassandra
    timeout: 10s
    connect_timeout: 10s

ruler:
  alertmanager_url: http://localhost:9093

limits_config:
  ingestion_rate_strategy: global
  ingestion_rate_mb: 1024
  ingestion_burst_size_mb: 5000
  max_label_name_length: 1024
  max_label_value_length: 4096
  max_label_names_per_series: 100
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  creation_grace_period: 10m
  enforce_metric_name: true
  max_line_size: 0
  max_line_size_truncate: false
  increment_duplicate_timestamp: false
  max_entries_limit_per_query: 50000
  #max_streams_per_user: 0
  max_streams_per_user: 500000
  #max_global_streams_per_user: 50000
  max_global_streams_per_user: 500000
  unordered_writes: true
  #max_chunks_per_query: 2000000
  max_chunks_per_query: 4000000
  max_query_length: 721h
  max_query_parallelism: 3500
  #max_query_parallelism: 4000
  #max_query_series: 500
  #max_query_series: 1000
  max_query_series: 10000
  cardinality_limit: 100000
  max_streams_matchers_per_query: 10000
  #max_concurrent_tail_requests: 100
  max_concurrent_tail_requests: 200
  ruler_evaluation_delay_duration: 0s
  ruler_max_rules_per_rule_group: 0
  ruler_max_rule_groups_per_tenant: 0
  per_stream_rate_limit: 512MB
  per_stream_rate_limit_burst: 1024MB
  max_cache_freshness_per_query: '10m'
  split_queries_by_interval: 2h
  #tsdb_max_query_parallelism: 1024
  #max_queriers_per_tenant: 128
#chunk_store_config:
  #max_look_back_period: 336h
  #max_query_lookback: 336h
table_manager:
  retention_deletes_enabled: true
  #retention_period: 336h
  retention_period: 24h
ingester:
  chunk_idle_period: 30m
  chunk_retain_period: 30s
  chunk_target_size: 1572864
  max_chunk_age: 12h
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        #store: inmemory
        store: memberlist
      replication_factor: 1
    #final_sleep: 0s
    final_sleep: 30s
  chunk_encoding: snappy

query_scheduler:
  #max_outstanding_requests_per_tenant: 10000
  max_outstanding_requests_per_tenant: 32768

##frontend:
  #max_outstanding_per_tenant: 8192
  #max_outstanding_per_tenant: 10000
querier:
  query_ingesters_within: 24h
  max_concurrent: 4096
  #max_concurrent: 16
#  multi_tenant_queries_enabled: true

frontend_worker:
  grpc_client_config:
    grpc_compression: snappy
    max_recv_msg_size: 1048576000
    max_send_msg_size: 1048576000
  parallelism: 50

distributor:
  rate_store:
    max_request_parallelism: 1000
compactor:
  retention_delete_delay: 1m

# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/usagestats/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
#analytics:
#  reporting_enabled: false

Getting below error

t=2023-04-13T17:16:00.022981819+05:30 level=error msg="Failed to evaluate rule" error="failed to execute query A: table loki_index1297 does not exist\n" duration=0s
logger=alertmanager org=1 t=2023-04-13T17:16:00.083888833+05:30 level=error component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="grafana-default-email/email[0]: notify retry canceled due to unrecoverable error after 1 attempts: failed to send notification to email addresses: <example@email.com>: gomail: could not send email 1: 454 4.7.1 <example@email.com>: Relay access denied"
logger=ngalert.scheduler rule_uid=zc4Aa00Vk org_id=1 version=6 attempt=0 now=2023-04-13T17:16:30+05:30 t=2023-04-13T17:16:30.025327963+05:30 level=error msg="Failed to evaluate rule" error="failed to execute query A: table loki_index1297 does not exist\n" duration=0s
logger=ngalert.sender.router rule_uid=zc4Aa00Vk org_id=1 t=2023-04-13T17:16:30.075109778+05:30 level=info msg="Sending alerts to local notifier" count=1
logger=ngalert.scheduler rule_uid=zc4Aa00Vk org_id=1 version=6 attempt=0 now=2023-04-13T17:17:00+05:30 t=2023-04-13T17:17:00.028541678+05:30 level=error msg="Failed to evaluate rule" error="failed to execute query A: table loki_index1297 does not exist\n" duration=0s
logger=ngalert.scheduler rule_uid=zc4Aa00Vk org_id=1 version=6 attempt=0 now=2023-04-13T17:17:30+05:30 t=2023-04-13T17:17:30.02743812+05:30 level=error msg="Failed to evaluate rule" error="failed to execute query A: table loki_index1297 does not exist\n" duration=0s
logger=ngalert.sender.router rule_uid=zc4Aa00Vk org_id=1 t=2023-04-13T17:17:30.073546491+05:30 level=info msg="Sending alerts to local notifier" count=1
DaveWK commented 1 year ago

I am also running into a similar problem. I want to use cassandra for index and object storage but it seems like it isn't creating the proper tables.. In my logs I see:

level=error ts=2023-07-07T17:21:36.118890812Z caller=series_index_store.go:584 org_id=fake msg="error querying storage" err="unconfigured table loki_index2792"
ts=2023-07-07T17:21:36.118943423Z caller=spanlogger.go:85 user=fake level=info org_id=fake latency=fast query_type=labels length=1h0m0s duration=977.931µs status=500 label= query= splits=0 throughput=0B total_bytes=0B total_entries=0

and also during start-up:

level=info ts=2023-07-07T00:01:34.723641187Z caller=modules.go:1184 msg="failed to initialize usage report" err="Unrecognized storage client cassandra, choose one of: aws, s3, gcs, azure, filesystem"

I don't want to use any of these proprietary cloud solutions (aws, gcs, azure..) and with the scylladb operator set up it's fairly easy to maintain scylla/cassandra clusters, so I would much rather use an "all cassandra" setup than have to use any kind of filesystem storage.

DaveWK commented 1 year ago

@Manoharan-NMS I found my problem was that "all" with regards to the "target" in the start command really means "all services except table_manager"

Ref: https://grafana.com/docs/loki/latest/upgrading/#the-single-binary-no-longer-runs-a-table-manager

I had to add --target=all,table-manager and then it created the tables

virtualb0x commented 11 months ago

@Manoharan-NMS Hello! Do you still use cassandra for yor Loki cluster? Did you make any tuning for cassandra BD for Loki?

JStickler commented 7 months ago

Closing as Cassandra was deprecated in 2.9. We do not plan any further development for Cassandra as a storage backend.