grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
22.75k stars 3.31k forks source link

LOKI Query Error BlobNotFound with AzureBlobStorage #13364

Open brunomiguel-teixeira opened 4 days ago

brunomiguel-teixeira commented 4 days ago

Describe the bug When performing a normal query, LOKI is throwing the following error consistently.

failed to load chunk 'fake/ffd008357ed3b779/1906e7b75e9:1906e7c1e26:25e4aa99': -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /src/loki/vendor/github.com/Azure/azure-storage-blob-go/azblob/zc_storage_error.go:42 ===== RESPONSE ERROR (ServiceCode=BlobNotFound) ===== Description=The specified blob does not exist. RequestId:6cd699a6-301e-0071-70ba-cb9be9000000 Time:2024-07-01T13:31:09.5012950Z, Details: Code: BlobNotFound GET https://xxxxxxxxxxxxxxxxx/loki/fake/ffd008357ed3b779/1906e7b75e9-1906e7c1e26-25e4aa99?timeout=31 Authorization: REDACTED User-Agent: [Azure-Storage/0.14 (go1.21.9; linux)] X-Ms-Client-Request-Id: [684cb973-c13e-4f66-7381-92cf7ab0446f] X-Ms-Date: [Mon, 01 Jul 2024 13:31:09 GMT] X-Ms-Version: [2020-04-08] -------------------------------------------------------------------------------- RESPONSE Status: 404 The specified blob does not exist. Content-Length: [215] Content-Type: [application/xml] Date: [Mon, 01 Jul 2024 13:31:08 GMT] Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0] X-Ms-Client-Request-Id: [684cb973-c13e-4f66-7381-92cf7ab0446f] X-Ms-Error-Code: [BlobNotFound] X-Ms-Request-Id: [6cd699a6-301e-0071-70ba-cb9be9000000] X-Ms-Version: [2020-04-08]
(Trace ID: ff6d7eb483c0250ab3e18bc2bb15bd74)

It looks like LOKI as some blob id indexed that is trying to reach out,. but that blob either nor long exists, or there was issue during blob upload or index generation.

Either way, this should NOT fail the entire query. It should still return data even if 1 blob is corrupted/missing.

To Reproduce N/A

Expected behavior Log/display the missing blocks, but continue and display available data.

Environment:

Screenshots, Promtail config, or terminal output

analytics:
  reporting_enabled: false
auth_enabled: false
ballast_bytes: 1073741824
chunk_store_config:
common:
  compactor_address: http://loki-infrastructure-logging-loki-compactor:3100
compactor:
  compaction_interval: 10m
  delete_request_store: azure
  retention_enabled: true
  working_directory: /var/loki/compactor
distributor:
  otlp_config:
    default_resource_attributes_as_index_labels:
    - tenant
    - k8s_cluster
    - k8s_namespace_name
    - k8s_container_name
    - k8s_deployment_name
  ring:
    kvstore:
      store: memberlist
frontend:
  compress_responses: true
  log_queries_longer_than: 5s
  scheduler_address: _grpclb._tcp.loki-infrastructure-logging-loki-query-scheduler.logging.svc.cluster.local.:9095
frontend_worker:
  scheduler_address: _grpclb._tcp.loki-infrastructure-logging-loki-query-scheduler.logging.svc.cluster.local.:9095
ingester:
  autoforget_unhealthy: true
  chunk_block_size: 262144
  chunk_encoding: snappy
  chunk_idle_period: 1h
  chunk_target_size: 1572864
  chunk_retain_period: 0s
  concurrent_flushes: 64
  flush_check_period: 30s
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
    heartbeat_period: 10s
  max_chunk_age: 2h
  wal:
    dir: /var/loki/wal
index_gateway:
  mode: simple
limits_config:
  discover_log_levels: false
  discover_service_name: []
  ingestion_burst_size_mb: 10240000
  ingestion_rate_mb: 1024000
  max_cache_freshness_per_query: 10m
  max_global_streams_per_user: 100000
  max_line_size_truncate: true
  max_query_bytes_read: 20024000000
  per_stream_rate_limit: 10GB
  per_stream_rate_limit_burst: 20GB
  query_timeout: 5m
  reject_old_samples: true
  reject_old_samples_max_age: 1d
  retention_period: 7d
  shard_streams:
    enabled: true
  split_queries_by_interval: 5m
querier:
  max_concurrent: 200
memberlist:
  join_members:
  - loki-infrastructure-logging-loki-memberlist.logging.svc.cluster.local:7946
  rejoin_interval: 5m
  dead_node_reclaim_time: 10s
  gossip_to_dead_nodes_time: 10s
query_scheduler:
  use_scheduler_ring: false
query_range:
  align_queries_with_step: false
  cache_results: true
  max_retries: 5
  results_cache:
    cache:
      memcached:
        expiration: 1h
        parallelism: 32
      memcached_client:
        consistent_hash: true
        host: loki-infrastructure-logging-loki-memcached-frontend.logging.svc.cluster.local
        max_idle_conns: 16
        service: memcached-client
        timeout: 500ms
        update_interval: 1m
ruler:
  alertmanager_url: https://alertmanager.xx
  external_url: https://alertmanager.xx
  ring:
    kvstore:
      store: memberlist
  rule_path: /tmp/loki/scratch
  storage:
    local:
      directory: /etc/loki/rules
    type: local
runtime_config:
  file: /var/infrastructure-logging-loki-runtime/runtime.yaml
schema_config:
  configs:
  - from: "2022-01-01"
    index:
      period: 24h
      prefix: loki_index_
    object_store: azure
    schema: v13
    store: tsdb
server:
  http_listen_port: 3100
  grpc_server_max_recv_msg_size: 8388608
  grpc_server_max_send_msg_size: 8388608
  log_level: warn
storage_config:
  azure:
    account_key: 'xxxxxxxxxxxx'
    account_name: lokiprd
    container_name: loki
  tsdb_shipper:
    active_index_directory: /var/loki/index
    cache_location: /var/loki/cache
    cache_ttl: 24h
    index_gateway_client:
      server_address: dns:///loki-infrastructure-logging-loki-index-gateway:9095
table_manager:
  retention_deletes_enabled: false
  retention_period: 0s