grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.65k stars 3.42k forks source link

failed to get token ranges for ingester err="zone not set" #13414

Open Assaf1928 opened 3 months ago

Assaf1928 commented 3 months ago

After last version update (3.1) I get the next error: level=error ts=2024-07-04T12:19:47.79081202Z caller=recalculate_owned_streams.go:55 msg="failed to get token ranges for ingester" err="zone not set"

this is my config-yml file:

After last version update (3.1) I get the next error: level=error ts=2024-07-04T12:19:47.79081202Z caller=recalculate_owned_streams.go:55 msg="failed to get token ranges for ingester" err="zone not set"

this is my config-yml file:

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

limits_config:
  retention_period: 720h

schema_config:
  configs:
    - from: 2021-08-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

# # chunk_store_config:
# #   max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

ruler:
  alertmanager_url: http://localhost:9093

 ingester:
   lifecycler:
     ring:
       kvstore:
         store: inmemory
       replication_factor: 1
     join_after: 30s
     final_sleep: 0s
   wal:
     enabled: true
     dir: /loki/wal

distributor:
  ring:
    kvstore:
      store: inmemory

compactor:
  working_directory: /loki/compactor
instantdreams commented 3 months ago

Had the same issue after a container update yesterday.

Checked my loki.yaml file and tried again today - and the image had been updated. New version does not have the issue. My Loki instance is now back up again.

I started up my promtail instances and received the following log entries:

promtail-1  | level=warn ts=2024-07-04T00:17:29.947533216Z caller=client.go:419 component=client host=192.168.1.93:3100 msg="error sending batch, will retry" status=-1 tenant=id-edge1 error="Post \"http://192.168.1.93:3100/loki/api/v1/push\": dial tcp 192.168.1.93:3100: connect: connection refused"

promtail-2  | level=warn ts=2024-07-04T13:53:36.198284808Z caller=promtail.go:263 msg="enable watchConfig"

promtail-3  | level=warn ts=2024-07-04T20:46:55.580209956Z caller=client.go:419 component=client host=192.168.1.93:3100 msg="error sending batch, will retry" status=500 tenant=id-services error="server returned HTTP status 500 Internal Server Error (500): empty ring"

promtail-4  | level=warn ts=2024-07-04T13:53:07.153379447Z caller=promtail.go:263 msg="enable watchConfig"
promtail-4  | level=warn ts=2024-07-04T20:46:55.565327215Z caller=client.go:419 component=client host=192.168.1.93:3100 msg="error sending batch, will retry" status=500 tenant=id-security error="server returned HTTP status 500 Internal Server Error (500): empty ring"

promtail-5  | level=warn ts=2024-07-04T00:18:31.464704864Z caller=promtail.go:263 msg="enable watchConfig"
promtail-5  | level=warn ts=2024-07-04T20:47:01.30170839Z caller=client.go:419 component=client host=192.168.1.93:3100 msg="error sending batch, will retry" status=-1 tenant=id-media error="Post \"http://192.168.1.93:3100/loki/api/v1/push\": dial tcp 192.168.1.93:3100: connect: connection refused"

Looks like I am still having issues with the inmemory ring key value store. Loki logs show no issues:

loki  | level=warn ts=2024-07-04T20:48:07.297659369Z caller=loki.go:288 msg="global timeout not configured, using default engine timeout (\"5m0s\"). This behavior will change in the next major to always use the default global timeout (\"5m\")."
loki  | level=warn ts=2024-07-04T20:48:07.312935977Z caller=cache.go:127 msg="fifocache config is deprecated. use embedded-cache instead"
loki  | level=warn ts=2024-07-04T20:48:07.312978812Z caller=experimental.go:20 msg="experimental feature in use" feature="In-memory (FIFO) cache - chunksembedded-cache"

Accessing the Loki metrics page displays current metrics, so the instance is running.

crazyelectron-io commented 2 months ago

New install of 3.1 gives me the same error

caller=recalculate_owned_streams.go:55 msg="failed to get token ranges for ingester" err="zone not set"
ondrejmo commented 2 months ago

I had the same issue with Loki 3.1.0 and the following configuration file:

---

auth_enabled: false

server:
  http_listen_port: 3100
  log_level: info

common:
  replication_factor: 1
  path_prefix: /loki

  storage:
    filesystem:
      chunks_directory: /loki/chunks

  ring:
    # instance_availability_zone: zone
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-05-28
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: index_
        period: 24h

compactor:
  retention_enabled: true
  retention_delete_delay: 2h
  delete_request_cancel_period: 24h
  delete_request_store: s3

ruler:
  enable_api: true
  enable_alertmanager_v2: true

  alertmanager_url: http://alertmanager.monitoring.svc:9093

  remote_write:
    enabled: true
    client:
      url: http://prometheus.monitoring.svc:9090/api/v1/write

  storage:
    type: s3
    s3:
      endpoint: minio.cloud.svc:9000
      s3forcepathstyle: true
      bucketnames: loki
      insecure: true
      access_key_id: ${S3_USER}
      secret_access_key: ${S3_PASS}

  wal:
    dir: /loki/ruler_wal

limits_config:
  retention_period: 720h
  max_query_length: 721h
  max_entries_limit_per_query: 1000000
  ingestion_rate_mb: 8 # MB, default is 4
  ingestion_burst_size_mb: 12 # MB, default is 6
  split_queries_by_interval: 15m
  unordered_writes: true
  retention_stream: []
  shard_streams:
    enabled: false
  volume_enabled: true
  discover_service_name: [ topic ]
  discover_log_levels: true

query_range:
  align_queries_with_step: true
  cache_results: true

query_scheduler:
  max_outstanding_requests_per_tenant: 2048

frontend:
  max_outstanding_per_tenant: 2048

analytics:
  reporting_enabled: false

ingester:
  max_chunk_age: 8h
  chunk_idle_period: 4h

pattern_ingester:
   enabled: true

storage_config:
 tsdb_shipper:
   active_index_directory: /loki/index
   cache_location: /loki/index_cache
 aws:
   s3: s3://${S3_USER}:${S3_PASS}@minio.cloud.svc:9000/loki
   s3forcepathstyle: true

However, when I un-comment the instance_availability_zone: zone line the error changes to "can't use ring configuration for computing token ranges"), which is defined in https://github.com/joe-elliott/dskit/blob/0e1c99b54ea7ef89ad80fa32cb2751cc0dbf5c32/ring/token_range.go#L77C26-L77C84

KAllan357 commented 2 months ago

I ran into this yesterday and gave up looking and went back to 3.0.0. I was using a fresh install of the latest Helm chart.

cboggs commented 2 months ago

I'm hitting this with all 3.1 configurations I can think to try. Seems #13103 introduced recalculate_owned_streams.go. At a glance, it seems this is invoked no matter what the configuration is (aside from, presumably, singlebinary), and it doesn't properly pick up the RF and/or zone awareness fields.

Rolled back to 3.0.0 to get things online, but would love to be able to upgrade. :-)

Hitesh-Agrawal commented 2 months ago

I see the same error for 3.1.0 , but the whole loki stack is working as expected. What is the cause of this error.

zeyneprumeysayorulmaz commented 2 months ago

I'm having the same problem. Can anyone solve the problem? Chart version: 6.10.0 Chart Name: loki Loki version: 3.1.1

Starefossen commented 1 month ago

Same error here with latest version of the official loki chart.

kristeey commented 4 weeks ago

experiencing the same error without zone awareness replication of ingesters for Chart version: 6.12.0 Chart Name: loki caller=recalculate_owned_streams.go:55 msg="failed to get token ranges for ingester" err="can't use ring configuration for computing token ranges" Downgraded for now. Is there any updates on this?

pikeas commented 1 hour ago

Same here, running single binary with filesystem storage and replication=1, so IIUC there are no zones in this configuration.