grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.84k stars 3.44k forks source link

Query Frontend Error Messages: caller=frontend_processor.go:145 msg="error processing requests" err=EOF #12299

Open justintaylor9 opened 7 months ago

justintaylor9 commented 7 months ago

We are running a bare-metal instance of Loki with separated read/write paths. The queriers use a DNS name for the query frontend that resolves to the querier IPs.

When starting the Loki service the querier connects to the query frontend and states that it's ready but error messages are repeatedly popping up that indicate it's not working properly.

Loki v2.9.4 loki.yml: --- start config --- target: read auth_enabled: true common: replication_factor: 3 ring: heartbeat_timeout: 10m kvstore: store: memberlist storage: s3: access_key_id: omitted bucketnames: loki endpoint: omitted http_config: insecure_skip_verify: true insecure: false region: default s3forcepathstyle: true secret_access_key: omitted compactor: compaction_interval: 1m retention_enabled: true shared_store: s3 working_directory: /loki/compactor frontend: compress_responses: true log_queries_longer_than: 15s frontend_worker: frontend_address: frontend.loki.it.ufl.edu:9443 grpc_client_config: max_send_msg_size: 104857600.0 parallelism: 12 ingester: chunk_idle_period: 1h flush_check_period: 10s max_chunk_age: 2h wal: replay_memory_ceiling: 24042MB limits_config: enforce_metric_name: false ingestion_burst_size_mb: 60 ingestion_rate_mb: 40 max_cache_freshness_per_query: 10m max_entries_limit_per_query: 100000 max_global_streams_per_user: 20000 max_query_parallelism: 32 max_query_series: 10000 per_stream_rate_limit: 40MB per_stream_rate_limit_burst: 60MB query_timeout: 3m reject_old_samples: true retention_period: 2w split_queries_by_interval: 15m memberlist: abort_if_cluster_join_fails: false bind_port: 7946 join_members:

The read nodes are behind an nginx reverse proxy for load-balancing.

Error messages:

error: code = Canceled desc = context canceled" Mar 21 18:03:02 az1-irs-o11y-prod-loki-read-01 loki[2280155]: level=error ts=2024-03-21T18:03:02.199245853Z caller=frontend_processor.go:69 msg="error processing requests" address=10.51.31.42:9443 err="rpc error: code = Canceled desc = context canceled" Mar 21 18:03:02 az1-irs-o11y-prod-loki-read-01 loki[2280155]: level=error ts=2024-03-21T18:03:02.199267163Z caller=frontend_processor.go:69 msg="error processing requests" address=10.51.31.42:9443 err="rpc error: code = Canceled desc = context canceled" Mar 21 18:03:02 az1-irs-o11y-prod-loki-read-01 loki[2280155]: level=error ts=2024-03-21T18:03:02.895233268Z caller=frontend_processor.go:145 msg="error processing requests" err=EOF Mar 21 18:03:03 az1-irs-o11y-prod-loki-read-01 loki[2280155]: level=error ts=2024-03-21T18:03:03.253163462Z caller=frontend_processor.go:69 msg="error processing requests" address=10.51.156.34:9443 err="rpc error: code = Canceled desc = context canceled" Mar 21 18:03:03 az1-irs-o11y-prod-loki-read-01 loki[2280155]: level=error ts=2024-03-21T18:03:03.253198493Z caller=frontend_processor.go:69 msg="error processing requests" address=10.51.156.34:9443 err="rpc error: code = Canceled desc = context canceled" Mar 21 18:03:03 az1-irs-o11y-prod-loki-read-01 loki[2280155]: level=error ts=2024-03-21T18:03:03.556305536Z caller=frontend_processor.go:145 msg="error processing requests" err=EOF Mar 21 18:03:03 az1-irs-o11y-prod-loki-read-01 loki[2280155]: level=error ts=2024-03-21T18:03:03.607206314Z caller=frontend_processor.go:145 msg="error processing requests" err=EOF Mar 21 18:03:03 az1-irs-o11y-prod-loki-read-01 loki[2280155]: level=error ts=2024-03-21T18:03:03.786873074Z caller=frontend_processor.go:145 msg="error processing requests" err=EOF Mar 21 18:03:03 az1-irs-o11y-prod-loki-read-01 loki[2280155]: level=error ts=2024-03-21T18:03:03.790565744Z caller=frontend_processor.go:145 msg="error processing requests" err=EOF

Let me know if there are any other logs I should be looking at to determine what is going on in my environment.

Thanks!

JStickler commented 7 months ago

Questions have a better chance of being answered if you ask them on the community forums.