grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.91k stars 3.45k forks source link

Ruler remote_write does not work when auth_enabled: true #14648

Open mishanchus opened 2 weeks ago

mishanchus commented 2 weeks ago

Describe the bug Ruler remote_write does not work when auth_enabled: true

To Reproduce Steps to reproduce the behavior: Loki Distributed Helm chart, 0.76.1, loki.config contains "auth_enabled: true". Ruler config in loki.config section:

    ruler:
      alertmanager_url: https://alertmanager.xx
      external_url: https://alertmanager.xx
      ring:
        kvstore:
          store: memberlist
      rule_path: /tmp/loki/scratch
      storage:
        local:
          directory: /etc/loki/rules
        type: local
      wal:
        dir: /var/loki/ruler-wal
      remote_write:
        enabled: true
        client:
            url: http://prometheus-server.monitoring:9090/api/v1/write

Ruler config in ruler section of values.yaml:

ruler:
  enabled: true
  extraArgs: ["-log.level=debug"]
  wal:
    dir: /var/loki/ruler-wal
  remote_write:
    enabled: true
    client:
        url: http://prometheus-server.monitoring:9090/api/v1/write
  directories:
    test:
      rules.yml: |
          someRulesHere

When the parameter auth_enabled: false metrics are writed correctly.

Expected behavior Metrics are written to the specified Prometheus server

Environment:

Screenshots, Promtail config, or terminal output Logs from the Ruler pod at startup. Prometheus will no longer be present in the logs. At the same time, it is clear that requests from rules are executed successfully.

level=info ts=2024-10-29T18:00:43.859827554Z caller=main.go:108 msg="Starting Loki" version="(version=2.9.2, branch=HEAD, revision=a17308db6)"
level=info ts=2024-10-29T18:00:43.861035615Z caller=server.go:322 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
level=warn ts=2024-10-29T18:00:43.862577348Z caller=cache.go:127 msg="fifocache config is deprecated. use embedded-cache instead"
level=warn ts=2024-10-29T18:00:43.862704389Z caller=experimental.go:20 msg="experimental feature in use" feature="In-memory (FIFO) cache - chunksembedded-cache"
level=info ts=2024-10-29T18:00:43.863013117Z caller=memberlist_client.go:434 msg="Using memberlist cluster label and node name" cluster_label= node=loki-loki-distributed-ruler-7dc998b49f-225bv-test8bc80c
ts=2024-10-29T18:00:43.863142031Z caller=memberlist_logger.go:74 level=debug msg="configured Transport is not a NodeAwareTransport and some features may not work as desired"
level=info ts=2024-10-29T18:00:43.863479765Z caller=mapper.go:47 msg="cleaning up mapped rules directory" path=/tmp/loki/scratch
level=debug ts=2024-10-29T18:00:43.86381407Z caller=tcp_transport.go:402 component="memberlist TCPTransport" msg=FinalAdvertiseAddr advertiseAddr=10.112.129.105 advertisePort=7946
level=debug ts=2024-10-29T18:00:43.864525163Z caller=tcp_transport.go:402 component="memberlist TCPTransport" msg=FinalAdvertiseAddr advertiseAddr=10.112.129.105 advertisePort=7946
level=info ts=2024-10-29T18:00:43.864692054Z caller=memberlist_client.go:540 msg="memberlist fast-join starting" nodes_found=1 to_join=4
level=info ts=2024-10-29T18:00:43.866343251Z caller=module_service.go:82 msg=initialising module=analytics
level=debug ts=2024-10-29T18:00:43.866392846Z caller=module_service.go:72 msg="module waiting for initialization" module=memberlist-kv waiting_for=server
level=debug ts=2024-10-29T18:00:43.86646012Z caller=module_service.go:72 msg="module waiting for initialization" module=ruler waiting_for=ring
level=debug ts=2024-10-29T18:00:43.866476702Z caller=module_service.go:72 msg="module waiting for initialization" module=store waiting_for=ingester-querier
level=info ts=2024-10-29T18:00:43.866491231Z caller=module_service.go:82 msg=initialising module=runtime-config
level=info ts=2024-10-29T18:00:43.866752501Z caller=module_service.go:82 msg=initialising module=server
level=debug ts=2024-10-29T18:00:43.866412771Z caller=module_service.go:72 msg="module waiting for initialization" module=ring waiting_for=memberlist-kv
level=debug ts=2024-10-29T18:00:43.866894388Z caller=module_service.go:72 msg="module waiting for initialization" module=ingester-querier waiting_for=memberlist-kv
level=info ts=2024-10-29T18:00:43.86694077Z caller=module_service.go:82 msg=initialising module=memberlist-kv
level=debug ts=2024-10-29T18:00:43.866967559Z caller=module_service.go:72 msg="module waiting for initialization" module=ingester-querier waiting_for=ring
level=debug ts=2024-10-29T18:00:43.866975931Z caller=module_service.go:72 msg="module waiting for initialization" module=ring waiting_for=runtime-config
level=debug ts=2024-10-29T18:00:43.866982456Z caller=module_service.go:72 msg="module waiting for initialization" module=ring waiting_for=server
level=info ts=2024-10-29T18:00:43.866989434Z caller=module_service.go:82 msg=initialising module=ring
ts=2024-10-29T18:00:43.867574496Z caller=memberlist_logger.go:74 level=debug msg="Initiating push/pull sync with:  10.112.131.121:7946"
ts=2024-10-29T18:00:43.869391893Z caller=memberlist_logger.go:74 level=debug msg="Initiating push/pull sync with:  10.112.129.93:7946"
ts=2024-10-29T18:00:43.870571094Z caller=memberlist_logger.go:74 level=debug msg="Initiating push/pull sync with:  10.112.131.120:7946"
ts=2024-10-29T18:00:43.871949269Z caller=memberlist_logger.go:74 level=debug msg="Failed to join 10.112.129.92:7946: dial tcp 10.112.129.92:7946: connect: connection refused"
ts=2024-10-29T18:00:43.872393557Z caller=memberlist_logger.go:74 level=debug msg="Initiating push/pull sync with:  10.112.132.254:7946"
level=info ts=2024-10-29T18:00:43.878566926Z caller=memberlist_client.go:560 msg="memberlist fast-join finished" joined_nodes=4 elapsed_time=13.880578ms
level=info ts=2024-10-29T18:00:43.878636692Z caller=memberlist_client.go:573 msg="joining memberlist cluster" join_members=loki-loki-distributed-memberlist
level=debug ts=2024-10-29T18:00:43.879155884Z caller=module_service.go:72 msg="module waiting for initialization" module=ingester-querier waiting_for=runtime-config
level=debug ts=2024-10-29T18:00:43.879176588Z caller=module_service.go:72 msg="module waiting for initialization" module=ingester-querier waiting_for=server
level=info ts=2024-10-29T18:00:43.879185412Z caller=module_service.go:82 msg=initialising module=ingester-querier
level=debug ts=2024-10-29T18:00:43.87920253Z caller=module_service.go:72 msg="module waiting for initialization" module=ruler waiting_for=runtime-config
level=debug ts=2024-10-29T18:00:43.879315351Z caller=module_service.go:72 msg="module waiting for initialization" module=ruler waiting_for=server
level=debug ts=2024-10-29T18:00:43.879323682Z caller=module_service.go:72 msg="module waiting for initialization" module=ruler waiting_for=store
level=debug ts=2024-10-29T18:00:43.87933275Z caller=module_service.go:72 msg="module waiting for initialization" module=store waiting_for=memberlist-kv
level=debug ts=2024-10-29T18:00:43.879340221Z caller=module_service.go:72 msg="module waiting for initialization" module=store waiting_for=ring
level=debug ts=2024-10-29T18:00:43.879347058Z caller=module_service.go:72 msg="module waiting for initialization" module=store waiting_for=runtime-config
level=debug ts=2024-10-29T18:00:43.879353751Z caller=module_service.go:72 msg="module waiting for initialization" module=store waiting_for=server
level=info ts=2024-10-29T18:00:43.879360419Z caller=module_service.go:82 msg=initialising module=store
level=debug ts=2024-10-29T18:00:43.879374336Z caller=module_service.go:72 msg="module waiting for initialization" module=ruler waiting_for=analytics
level=debug ts=2024-10-29T18:00:43.879381682Z caller=module_service.go:72 msg="module waiting for initialization" module=ruler waiting_for=ingester-querier
level=debug ts=2024-10-29T18:00:43.879388133Z caller=module_service.go:72 msg="module waiting for initialization" module=ruler waiting_for=memberlist-kv
level=info ts=2024-10-29T18:00:43.879394486Z caller=module_service.go:82 msg=initialising module=ruler
level=info ts=2024-10-29T18:00:43.879414219Z caller=ruler.go:528 msg="ruler up and running"
level=debug ts=2024-10-29T18:00:43.879423321Z caller=ruler.go:566 msg="syncing rules" reason=initial
level=info ts=2024-10-29T18:00:43.880658306Z caller=mapper.go:160 msg="updating rule file" file=/tmp/loki/scratch/test/rules.yml
level=info ts=2024-10-29T18:00:43.880836647Z caller=loki.go:505 msg="Loki started"
level=debug ts=2024-10-29T18:00:43.88102335Z caller=manager.go:143 msg="updating rules" user=test
level=debug ts=2024-10-29T18:00:43.881043355Z caller=manager.go:146 msg="creating rule manager for user" user=test
level=debug ts=2024-10-29T18:00:43.881301263Z caller=manager.go:289 user=test msg="Starting provider" provider=static/0 subs=map[config-0:{}]
level=debug ts=2024-10-29T18:00:43.881559269Z caller=manager.go:323 user=test msg="Discoverer channel closed" provider=static/0
ts=2024-10-29T18:00:43.881752402Z caller=memberlist_logger.go:74 level=debug msg="Initiating push/pull sync with:  10.112.131.121:7946"
level=debug ts=2024-10-29T18:00:43.882411013Z caller=instance.go:231 storage=registry manager=tenant-wal instance=test msg="initializing instance" name=test
level=info ts=2024-10-29T18:00:43.882579041Z caller=manager.go:995 user=test msg="Starting rule manager..."
ts=2024-10-29T18:00:43.883572128Z caller=memberlist_logger.go:74 level=debug msg="Initiating push/pull sync with:  10.112.129.93:7946"
ts=2024-10-29T18:00:43.884989281Z caller=memberlist_logger.go:74 level=debug msg="Initiating push/pull sync with:  10.112.132.254:7946"
level=info ts=2024-10-29T18:00:43.886366443Z caller=wal.go:127 storage=registry manager=tenant-wal instance=test msg="replaying WAL, this may take a while" dir=/var/loki/ruler-wal/test/wal
level=info ts=2024-10-29T18:00:43.886563227Z caller=wal.go:174 storage=registry manager=tenant-wal instance=test msg="WAL segment loaded" segment=0 maxSegment=0
ts=2024-10-29T18:00:43.8869244Z caller=memberlist_logger.go:74 level=debug msg="Initiating push/pull sync with:  10.112.131.120:7946"
ts=2024-10-29T18:00:43.887550084Z caller=dedupe.go:112 storage=registry manager=tenant-wal instance=test component=remote level=info remote_name=test-rw-default url=http://prometheus-server.monitoring:9090/api/v1/write msg="Starting WAL watcher" queue=test-rw-default
level=debug ts=2024-10-29T18:00:43.887580082Z caller=instance.go:271 storage=registry manager=tenant-wal instance=test msg="running instance" name=test
ts=2024-10-29T18:00:43.887681325Z caller=dedupe.go:112 storage=registry manager=tenant-wal instance=test component=remote level=info remote_name=test-rw-default url=http://prometheus-server.monitoring:9090/api/v1/write msg="Replaying WAL" queue=test-rw-default
ts=2024-10-29T18:00:43.887736159Z caller=dedupe.go:112 storage=registry manager=tenant-wal instance=test component=remote level=debug remote_name=test-rw-default url=http://prometheus-server.monitoring:9090/api/v1/write msg="Tailing WAL" lastCheckpoint= checkpointIndex=0 currentSegment=0 lastSegment=0
ts=2024-10-29T18:00:43.887754316Z caller=dedupe.go:112 storage=registry manager=tenant-wal instance=test component=remote level=debug remote_name=test-rw-default url=http://prometheus-server.monitoring:9090/api/v1/write msg="Processing segment" currentSegment=0
ts=2024-10-29T18:00:43.89079699Z caller=memberlist_logger.go:74 level=debug msg="Failed to join 10.112.129.92:7946: dial tcp 10.112.129.92:7946: connect: connection refused"
level=info ts=2024-10-29T18:00:43.890828041Z caller=memberlist_client.go:592 msg="joining memberlist cluster succeeded" reached_nodes=4 elapsed_time=12.183515ms
ts=2024-10-29T18:00:45.490955961Z caller=memberlist_logger.go:74 level=debug msg="Stream connection from=10.24.12.47:37952"
ts=2024-10-29T18:00:55.246790142Z caller=memberlist_logger.go:74 level=debug msg="Stream connection from=10.24.12.47:59134"
level=info ts=2024-10-29T18:00:59.860483647Z caller=compat.go:66 user=test rule_name=test:path:requests:sum:rate:1m rule_type=recording query="sum(rate({stream=\"stdout\", app_kubernetes_io_instance=\"ingress-nginx-controller\"} [1m]))" query_hash=3130213460 msg="evaluating rule"
level=info ts=2024-10-29T18:00:59.86058417Z caller=engine.go:232 component=ruler evaluation_mode=local org_id=test traceID=15b852197f5897f5 msg="executing query" type=instant query="sum(rate({stream=\"stdout\", app_kubernetes_io_instance=\"ingress-nginx-controller\"} [1m]))" query_hash=3130213460
ts=2024-10-29T18:00:59.864966409Z caller=spanlogger.go:86 user=test level=debug shortcut=false from=2024-10-29T17:59:59.86Z through=2024-10-29T18:00:59.861Z err=null
ts=2024-10-29T18:00:59.866020979Z caller=spanlogger.go:86 user=test level=debug ingester-chunks-count=0
level=debug ts=2024-10-29T18:00:59.866043152Z caller=async_store.go:93 msg="got chunk ids from ingester" count=0
ts=2024-10-29T18:00:59.866299649Z caller=spanlogger.go:86 user=test level=debug Ingester.TotalReached=1 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=1 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.PostFilterLInes=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.PostFilterLInes=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2024-10-29T18:00:59.866354538Z caller=spanlogger.go:86 user=test level=debug Cache.Chunk.Requests=0 Cache.Chunk.EntriesRequested=0 Cache.Chunk.EntriesFound=0 Cache.Chunk.EntriesStored=0 Cache.Chunk.BytesSent="0 B" Cache.Chunk.BytesReceived="0 B" Cache.Chunk.DownloadTime=0s Cache.Index.Requests=0 Cache.Index.EntriesRequested=0 Cache.Index.EntriesFound=0 Cache.Index.EntriesStored=0 Cache.Index.BytesSent="0 B" Cache.Index.BytesReceived="0 B" Cache.Index.DownloadTime=0s Cache.StatsResult.Requests=0 Cache.StatsResult.EntriesRequested=0 Cache.StatsResult.EntriesFound=0 Cache.StatsResult.EntriesStored=0 Cache.StatsResult.BytesSent="0 B" Cache.StatsResult.BytesReceived="0 B" Cache.Result.DownloadTime=0s Cache.Result.Requests=0 Cache.Result.EntriesRequested=0 Cache.Result.EntriesFound=0 Cache.Result.EntriesStored=0 Cache.Result.BytesSent="0 B" Cache.Result.BytesReceived="0 B" Cache.Result.DownloadTime=0s
ts=2024-10-29T18:00:59.866375885Z caller=spanlogger.go:86 user=test level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.PostFilterLines=0 Summary.ExecTime=5.685435ms Summary.QueueTime=0s
level=info ts=2024-10-29T18:00:59.866417833Z caller=metrics.go:159 component=ruler evaluation_mode=local org_id=test traceID=15b852197f5897f5 latency=fast query="sum(rate({stream=\"stdout\", app_kubernetes_io_instance=\"ingress-nginx-controller\"} [1m]))" query_hash=3130213460 query_type=metric range_type=instant length=0s start_delta=6.100723ms end_delta=6.100897ms step=0s duration=5.685435ms status=200 limit=0 returned_lines=0 throughput=0B total_bytes=0B total_bytes_structured_metadata=0B lines_per_second=0 total_lines=0 post_filter_lines=0 total_entries=0 store_chunks_download_time=0s queue_time=0s splits=0 shards=0 cache_chunk_req=0 cache_chunk_hit=0 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=0 cache_chunk_download_time=0s cache_index_req=0 cache_index_hit=0 cache_index_download_time=0s cache_stats_results_req=0 cache_stats_results_hit=0 cache_stats_results_download_time=0s cache_result_req=0 cache_result_hit=0 cache_result_download_time=0s
level=debug ts=2024-10-29T18:00:59.866762568Z caller=registry.go:152 storage=registry user=test msg="refreshing remote-write configuration"
level=info ts=2024-10-29T18:00:59.867086377Z caller=manager.go:202 storage=registry manager=tenant-wal msg="dynamically updated instance" instance=test
level=info ts=2024-10-29T18:00:59.867147592Z caller=compat.go:66 user=test rule_name=test:path:latency:quantile:1m rule_type=recording query="avg(quantile_over_time(0.95,{stream=\"stdout\", app_kubernetes_io_instance=\"ingress-nginx-controller\"}  | __error__=\"\" | unwrap latency[1m]))" query_hash=3710904808 msg="evaluating rule"
level=info ts=2024-10-29T18:00:59.867192663Z caller=engine.go:232 component=ruler evaluation_mode=local org_id=test traceID=4cd7cc62abb51df7 msg="executing query" type=instant query="avg(quantile_over_time(0.95,{stream=\"stdout\", app_kubernetes_io_instance=\"ingress-nginx-controller\"}  | __error__=\"\" | unwrap latency[1m]))" query_hash=3710904808
ts=2024-10-29T18:00:59.868276402Z caller=spanlogger.go:86 user=test level=debug shortcut=false from=2024-10-29T17:59:59.86Z through=2024-10-29T18:00:59.861Z err=null
ts=2024-10-29T18:00:59.869052724Z caller=spanlogger.go:86 user=test level=debug ingester-chunks-count=0
level=debug ts=2024-10-29T18:00:59.869074269Z caller=async_store.go:93 msg="got chunk ids from ingester" count=0
ts=2024-10-29T18:00:59.869372104Z caller=spanlogger.go:86 user=test level=debug Ingester.TotalReached=1 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=1 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.PostFilterLInes=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.PostFilterLInes=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2024-10-29T18:00:59.869420004Z caller=spanlogger.go:86 user=test level=debug Cache.Chunk.Requests=0 Cache.Chunk.EntriesRequested=0 Cache.Chunk.EntriesFound=0 Cache.Chunk.EntriesStored=0 Cache.Chunk.BytesSent="0 B" Cache.Chunk.BytesReceived="0 B" Cache.Chunk.DownloadTime=0s Cache.Index.Requests=0 Cache.Index.EntriesRequested=0 Cache.Index.EntriesFound=0 Cache.Index.EntriesStored=0 Cache.Index.BytesSent="0 B" Cache.Index.BytesReceived="0 B" Cache.Index.DownloadTime=0s Cache.StatsResult.Requests=0 Cache.StatsResult.EntriesRequested=0 Cache.StatsResult.EntriesFound=0 Cache.StatsResult.EntriesStored=0 Cache.StatsResult.BytesSent="0 B" Cache.StatsResult.BytesReceived="0 B" Cache.Result.DownloadTime=0s Cache.Result.Requests=0 Cache.Result.EntriesRequested=0 Cache.Result.EntriesFound=0 Cache.Result.EntriesStored=0 Cache.Result.BytesSent="0 B" Cache.Result.BytesReceived="0 B" Cache.Result.DownloadTime=0s
ts=2024-10-29T18:00:59.869440996Z caller=spanlogger.go:86 user=test level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.PostFilterLines=0 Summary.ExecTime=2.161736ms Summary.QueueTime=0s
level=info ts=2024-10-29T18:00:59.869551523Z caller=metrics.go:159 component=ruler evaluation_mode=local org_id=test traceID=4cd7cc62abb51df7 latency=fast query="avg(quantile_over_time(0.95,{stream=\"stdout\", app_kubernetes_io_instance=\"ingress-nginx-controller\"}  | __error__=\"\" | unwrap latency[1m]))" query_hash=3710904808 query_type=metric range_type=instant length=0s start_delta=9.23279ms end_delta=9.233045ms step=0s duration=2.161736ms status=200 limit=0 returned_lines=0 throughput=0B total_bytes=0B total_bytes_structured_metadata=0B lines_per_second=0 total_lines=0 post_filter_lines=0 total_entries=0 store_chunks_download_time=0s queue_time=0s splits=0 shards=0 cache_chunk_req=0 cache_chunk_hit=0 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=0 cache_chunk_download_time=0s cache_index_req=0 cache_index_hit=0 cache_index_download_time=0s cache_stats_results_req=0 cache_stats_results_hit=0 cache_stats_results_download_time=0s cache_result_req=0 cache_result_hit=0 cache_result_download_time=0s
level=info ts=2024-10-29T18:01:06.598119128Z caller=compat.go:66 user=test rule_name=test:path:latency:quantile:1m rule_type=recording query="avg(quantile_over_time(0.95,{stream=\"stdout\", app_kubernetes_io_instance=\"ingress-nginx-controller\"} | __error__=\"\" | unwrap latency[1m]) by (method,status))" query_hash=2104303924 msg="evaluating rule"
level=info ts=2024-10-29T18:01:06.598205811Z caller=engine.go:232 component=ruler evaluation_mode=local org_id=test traceID=16cebfad403ab196 msg="executing query" type=instant query="avg(quantile_over_time(0.95,{stream=\"stdout\", app_kubernetes_io_instance=\"ingress-nginx-controller\"} | __error__=\"\" | unwrap latency[1m]) by (method,status))" query_hash=2104303924
ts=2024-10-29T18:01:06.599805457Z caller=spanlogger.go:86 user=test level=debug shortcut=false from=2024-10-29T18:00:06.597Z through=2024-10-29T18:01:06.598Z err=null
ts=2024-10-29T18:01:06.600620768Z caller=spanlogger.go:86 user=test level=debug ingester-chunks-count=0
level=debug ts=2024-10-29T18:01:06.600640201Z caller=async_store.go:93 msg="got chunk ids from ingester" count=0
ts=2024-10-29T18:01:06.600837822Z caller=spanlogger.go:86 user=test level=debug Ingester.TotalReached=1 Ingester.TotalChunksMatched=0 Ingester.TotalBatches=1 Ingester.TotalLinesSent=0 Ingester.TotalChunksRef=0 Ingester.TotalChunksDownloaded=0 Ingester.ChunksDownloadTime=0s Ingester.HeadChunkBytes="0 B" Ingester.HeadChunkLines=0 Ingester.DecompressedBytes="0 B" Ingester.DecompressedLines=0 Ingester.PostFilterLInes=0 Ingester.CompressedBytes="0 B" Ingester.TotalDuplicates=0 Querier.TotalChunksRef=0 Querier.TotalChunksDownloaded=0 Querier.ChunksDownloadTime=0s Querier.HeadChunkBytes="0 B" Querier.HeadChunkLines=0 Querier.DecompressedBytes="0 B" Querier.DecompressedLines=0 Querier.PostFilterLInes=0 Querier.CompressedBytes="0 B" Querier.TotalDuplicates=0
ts=2024-10-29T18:01:06.60086935Z caller=spanlogger.go:86 user=test level=debug Cache.Chunk.Requests=0 Cache.Chunk.EntriesRequested=0 Cache.Chunk.EntriesFound=0 Cache.Chunk.EntriesStored=0 Cache.Chunk.BytesSent="0 B" Cache.Chunk.BytesReceived="0 B" Cache.Chunk.DownloadTime=0s Cache.Index.Requests=0 Cache.Index.EntriesRequested=0 Cache.Index.EntriesFound=0 Cache.Index.EntriesStored=0 Cache.Index.BytesSent="0 B" Cache.Index.BytesReceived="0 B" Cache.Index.DownloadTime=0s Cache.StatsResult.Requests=0 Cache.StatsResult.EntriesRequested=0 Cache.StatsResult.EntriesFound=0 Cache.StatsResult.EntriesStored=0 Cache.StatsResult.BytesSent="0 B" Cache.StatsResult.BytesReceived="0 B" Cache.Result.DownloadTime=0s Cache.Result.Requests=0 Cache.Result.EntriesRequested=0 Cache.Result.EntriesFound=0 Cache.Result.EntriesStored=0 Cache.Result.BytesSent="0 B" Cache.Result.BytesReceived="0 B" Cache.Result.DownloadTime=0s
ts=2024-10-29T18:01:06.600893369Z caller=spanlogger.go:86 user=test level=debug Summary.BytesProcessedPerSecond="0 B" Summary.LinesProcessedPerSecond=0 Summary.TotalBytesProcessed="0 B" Summary.TotalLinesProcessed=0 Summary.PostFilterLines=0 Summary.ExecTime=2.603419ms Summary.QueueTime=0s
level=info ts=2024-10-29T18:01:06.60098987Z caller=metrics.go:159 component=ruler evaluation_mode=local org_id=test traceID=16cebfad403ab196 latency=fast query="avg(quantile_over_time(0.95,{stream=\"stdout\", app_kubernetes_io_instance=\"ingress-nginx-controller\"} | __error__=\"\" | unwrap latency[1m]) by (method,status))" query_hash=2104303924 query_type=metric range_type=instant length=0s start_delta=3.317956ms end_delta=3.318127ms step=0s duration=2.603419ms status=200 limit=0 returned_lines=0 throughput=0B total_bytes=0B total_bytes_structured_metadata=0B lines_per_second=0 total_lines=0 post_filter_lines=0 total_entries=0 store_chunks_download_time=0s queue_time=0s splits=0 shards=0 cache_chunk_req=0 cache_chunk_hit=0 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=0 cache_chunk_download_time=0s cache_index_req=0 cache_index_hit=0 cache_index_download_time=0s cache_stats_results_req=0 cache_stats_results_hit=0 cache_stats_results_download_time=0s cache_result_req=0 cache_result_hit=0 cache_result_download_time=0s

When auth_enabled disabled there is success message in logs

ts=2024-10-30T07:08:59.969070692Z caller=dedupe.go:112 storage=registry manager=tenant-wal instance=test component=remote level=info remote_name=test-rw-default url=http://prometheus-server.monitoring:9090/api/v1/write msg="Done replaying WAL" duration=1m19.280216929s

After upgrade to 3.2.0 we get a useful message in the logs:

ts=2024-10-30T08:32:36.624231673Z caller=dedupe.go:112 storage=registry manager=tenant-wal instance=test component=remote level=debug remote_name=test-rw-default url=http://prometheus-server.monitoring:9090/api/v1/write msg="Watcher is reading the WAL due to timeout, haven't received any write notifications recently" timeout=15s

Prometheus works fine. And if you change the address to a non-existent one, the timeout error will be exactly the same

axozoid commented 2 weeks ago

Having the same issue. Upvoting.

lieberlois commented 2 days ago

@mishanchus I managed to get it working. This is the ruler config in the helm chart, apparently the ruler config you used (had the same issue) is broken in the helm chart:

loki:
  rulerConfig:
    # https://github.com/grafana/loki/issues/9114#issuecomment-1506620229
    wal:
      dir: /var/loki/ruler-wal

    alertmanager_url: "http://mimir-nginx.mimir:80/alertmanager"

    remote_write:
      enabled: true
      add_org_id_header: true
      clients:
        mimir:
          url: http://mimir-nginx.mimir:80/api/v1/push

I found the config here so i assumed i have to switch to loki.rulerConfig

@JStickler If you point me to the right location i can contribute this to the docs 😄 👍

JStickler commented 1 day ago

@lieberlois I'm not sure where exactly which docs topic(s) you want to update?

For the most part, the URL for a page in the documentation maps to the source files, so for example:

The source for the published page https://grafana.com/docs/loki/latest/setup/install/helm/install-monolithic/ Can be found in https://github.com/grafana/loki/tree/main/docs/sources/setup/install/helm/install-monolithic

The Helm reference topic https://grafana.com/docs/loki/latest/setup/install/helm/reference/ Is automatically generated from the template found in https://github.com/grafana/loki/tree/main/production/helm/loki/reference.md.gotmpl

The Loki configuration file https://grafana.com/docs/loki/latest/configure/#ruler Is automatically generated from code comments and the template found in https://github.com/grafana/loki/blob/main/docs/templates/configuration.template