loki4j / loki-logback-appender

Fast and lightweight implementation of Logback appender for Grafana Loki
https://loki4j.github.io/loki-logback-appender/
BSD 2-Clause "Simplified" License
314 stars 27 forks source link

Instead of retrying, Loki4j drops queued logs when Loki is unreachable. #262

Closed VenkateswaranJ closed 3 weeks ago

VenkateswaranJ commented 4 weeks ago

I have a microservice environment where the Loki container and the Java applications run in separate Docker containers, managed by Nomad. The application container usually starts before the Loki container and sends logs to Loki via the Loki4j appender. I expect the Loki4j appender to queue logs and retry when Loki is unreachable, which it does during a ConnectException. However, when there is no Loki container, the Loki4j Logback appender's Java client throws an IOException, and retries are skipped to avoid duplicating logs.

Please check the logs from the Loki4j appender

1:29:47,177 |-INFO in com.github.loki4j.logback.Loki4jAppender[LOKI] - Starting with batchMaxItems=500, batchMaxBytes=4194304, batchTimeout=30000, sendQueueMaxBytes=104857600... 21:29:47,646 |-INFO in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - Pipeline is starting... 21:29:47,662 |-INFO in com.github.loki4j.logback.Loki4jAppender[LOKI] - Successfully started 21:29:47,662 |-INFO in ch.qos.logback.classic.model.processor.RootLoggerModelHandler - Setting the level of ROOT logger to OFF 21:29:47,663 |-INFO in ch.qos.logback.core.model.processor.AppenderRefModelHandler - Attaching appender named [STDOUT] to Logger[ROOT] 21:29:47,663 |-INFO in ch.qos.logback.core.model.processor.AppenderRefModelHandler - Attaching appender named [FILE] to Logger[ROOT] 21:29:47,664 |-INFO in ch.qos.logback.core.model.processor.AppenderRefModelHandler - Attaching appender named [CLI] to Logger[ROOT] 21:29:47,664 |-INFO in ch.qos.logback.core.model.processor.AppenderRefModelHandler - Attaching appender named [LOKI] to Logger[ROOT] 21:29:47,665 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [com.hazelcast] to WARN 21:29:47,666 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [io.vertx.core] to DEBUG 21:29:47,666 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [io.netty] to INFO 21:29:47,666 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.Main] to DEBUG 21:29:47,666 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.assetmanagement] to DEBUG 21:29:47,667 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.clustermonitor] to DEBUG 21:29:47,667 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.engineadaptor] to DEBUG 21:29:47,667 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.eventlogging] to DEBUG 21:29:47,667 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.grpcutils] to DEBUG 21:29:47,667 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.httpbridge] to DEBUG 21:29:47,667 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.intendant] to DEBUG 21:29:47,667 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.intercom] to DEBUG 21:29:47,667 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.licensemanagement] to DEBUG 21:29:47,667 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.metrics] to DEBUG 21:29:47,668 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.mnipadaptor] to DEBUG 21:29:47,668 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.mnipapplication] to DEBUG 21:29:47,668 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.networkmanagement] to DEBUG 21:29:47,668 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.nios] to DEBUG 21:29:47,668 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.nmosadaptor] to DEBUG 21:29:47,668 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.nmosnode] to DEBUG 21:29:47,668 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.nmosserver] to DEBUG 21:29:47,668 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.permissionmanagement] to DEBUG 21:29:47,671 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.sameapplication] to DEBUG 21:29:47,672 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.simulated] to DEBUG 21:29:47,672 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.systemmanager] to DEBUG 21:29:47,672 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.topology] to DEBUG 21:29:47,672 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.usermanagement] to DEBUG 21:29:47,672 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.webrtcgatewayadaptor] to DEBUG 21:29:47,672 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.webrtcsessionmediator] to DEBUG 21:29:47,672 |-INFO in ch.qos.logback.classic.model.processor.LoggerModelHandler - Setting level of logger [net.riedel.conductor.runtime] to DEBUG 21:29:47,673 |-INFO in ch.qos.logback.core.model.processor.DefaultProcessor@6a84a97d - End of configuration. 21:29:47,673 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6d3a388c - Registering current configuration as safe fallback point 21:29:47,674 |-INFO in ch.qos.logback.classic.util.ContextInitializer@193f604a - ch.qos.logback.classic.util.DefaultJoranConfigurator.configure() call lasted 1030 milliseconds. ExecutionStatus=DO_NOT_INVOKE_NEXT_IF_ANY 21:30:17,288 |-INFO in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - >>> Batch #1d7a1d29e813 (DRAIN, 14 records, 7 streams, est. size 3,813 bytes) converted to 4,288 bytes 21:30:17,347 |-ERROR in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - Error while sending Batch #1d7a1d29e813 (4,288 bytes) to Loki (http://127.0.0.1:18092/loki/api/v1/push) java.io.IOException: HTTP/1.1 header parser received no bytes at java.io.IOException: HTTP/1.1 header parser received no bytes at at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:964) at at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:133) at at com.github.loki4j.client.http.JavaHttpClient.send(JavaHttpClient.java:68) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendBatch(AsyncBufferPipeline.java:322) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendStep(AsyncBufferPipeline.java:293) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.runSendLoop(AsyncBufferPipeline.java:224) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.lambda$start$3(AsyncBufferPipeline.java:131) at at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at at java.base/java.lang.Thread.run(Thread.java:1583) Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes at at java.net.http/jdk.internal.net.http.common.Utils.wrapWithExtraDetail(Utils.java:388) at at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:590) at at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.checkForErrors(Http1AsyncReceiver.java:302) at at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:268) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207) at ... 3 common frames omitted Caused by: java.io.EOFException: EOF reached while reading at at java.net.http/jdk.internal.net.http.Http1AsyncReceiver$Http1TubeSubscriber.onComplete(Http1AsyncReceiver.java:601) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion(SocketTube.java:648) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:853) at at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:181) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:280) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:233) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:782) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:965) at at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:253) at at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:1467) at at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:1412) at at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:1412) 21:30:47,386 |-INFO in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - >>> Batch #1d811e3c450f (DRAIN, 431 records, 34 streams, est. size 315,976 bytes) converted to 327,467 bytes 21:30:47,410 |-ERROR in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - Error while sending Batch #1d811e3c450f (327,467 bytes) to Loki (http://127.0.0.1:18092/loki/api/v1/push) java.io.IOException: HTTP/1.1 header parser received no bytes at java.io.IOException: HTTP/1.1 header parser received no bytes at at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:964) at at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:133) at at com.github.loki4j.client.http.JavaHttpClient.send(JavaHttpClient.java:68) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendBatch(AsyncBufferPipeline.java:322) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendStep(AsyncBufferPipeline.java:293) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.runSendLoop(AsyncBufferPipeline.java:224) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.lambda$start$3(AsyncBufferPipeline.java:131) at at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at at java.base/java.lang.Thread.run(Thread.java:1583) Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes at at java.net.http/jdk.internal.net.http.common.Utils.wrapWithExtraDetail(Utils.java:388) at at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:590) at at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.checkForErrors(Http1AsyncReceiver.java:302) at at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:268) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207) at ... 3 common frames omitted Caused by: java.io.EOFException: EOF reached while reading at at java.net.http/jdk.internal.net.http.Http1AsyncReceiver$Http1TubeSubscriber.onComplete(Http1AsyncReceiver.java:601) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion(SocketTube.java:648) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:853) at at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:181) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:280) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:233) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:782) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:965) at at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:253) at at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:1467) at at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:1412) at at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:1412) 21:31:38,078 |-INFO in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - >>> Batch #1d8cecc0e371 (DRAIN, 1 records, 1 streams, est. size 325 bytes) converted to 400 bytes 21:31:38,108 |-ERROR in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - Error while sending Batch #1d8cecc0e371 (400 bytes) to Loki (http://127.0.0.1:18092/loki/api/v1/push) java.io.IOException: HTTP/1.1 header parser received no bytes at java.io.IOException: HTTP/1.1 header parser received no bytes at at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:964) at at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:133) at at com.github.loki4j.client.http.JavaHttpClient.send(JavaHttpClient.java:68) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendBatch(AsyncBufferPipeline.java:322) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendStep(AsyncBufferPipeline.java:293) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.runSendLoop(AsyncBufferPipeline.java:224) at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.lambda$start$3(AsyncBufferPipeline.java:131) at at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at at java.base/java.lang.Thread.run(Thread.java:1583) Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes at at java.net.http/jdk.internal.net.http.common.Utils.wrapWithExtraDetail(Utils.java:388) at at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:590) at at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.checkForErrors(Http1AsyncReceiver.java:302) at at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:268) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:182) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207) at ... 3 common frames omitted Caused by: java.io.EOFException: EOF reached while reading at at java.net.http/jdk.internal.net.http.Http1AsyncReceiver$Http1TubeSubscriber.onComplete(Http1AsyncReceiver.java:601) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadSubscription.signalCompletion(SocketTube.java:648) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:853) at at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:181) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:207) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:280) at at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:233) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:782) at at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:965) at at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:253) at at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:1467) at at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:1412) at at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:1412) 21:32:38,169 |-INFO in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - >>> Batch #1d9aea7ba9e5 (DRAIN, 1 records, 1 streams, est. size 325 bytes) converted to 400 bytes 21:32:38,229 |-INFO in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - <<< Batch #1d9aea7ba9e5 (400 bytes): Loki responded with status 204 21:33:38,175 |-INFO in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - >>> Batch #1da8e327473d (DRAIN, 1 records, 1 streams, est. size 324 bytes) converted to 399 bytes 21:33:38,187 |-INFO in com.github.loki4j.client.pipeline.AsyncBufferPipeline@2bfc268b - <<< Batch #1da8e327473d (399 bytes): Loki responded with status 204

As you can see the Batch #1d7a1d29e813 & #1d811e3c450f are dropped instead of retry.

But when I change it from ConnectException to IOException, it starts queuing logs as expected. https://github.com/loki4j/loki-logback-appender/blob/3e528a09792d84a8d745c0d156d1b633dea26c02/loki-client/src/main/java/com/github/loki4j/client/pipeline/AsyncBufferPipeline.java#L374

@nehaev Should we consider changing the queuing mechanism for IOException instead of limiting it to ConnectException?

Or do you know a way to retain the logs in the sendQueue when the Loki instance is unavailable or unreachable?

nehaev commented 3 weeks ago

Hi @VenkateswaranJ, thanks for reporting this!

Could you please specify the version of Loki4j and JDK that you use?

VenkateswaranJ commented 3 weeks ago

Hi @nehaev

Loki4J - 1.5.2 Java - 21 (I also tried with Java 17 but it has the same behaviour)

The bug fix for connection timeout appears similar, but I don't believe it will resolve this issue. https://github.com/loki4j/loki-logback-appender/issues/243

To reproduce this issue, please start the Logback appender without running a Loki instance.

nehaev commented 3 weeks ago

I'm trying to reproduce this on the main branch, Temurin-21.0.4 and see only ConnectException.

20:26:21,869 |-ERROR in com.github.loki4j.client.pipeline.AsyncBufferPipeline@247d8ae - Error while sending Batch #262b8945bf01 (47,084 bytes) to Loki (http://127.0.0.1:3100/loki/api/v1/push) java.net.ConnectException
        at java.net.ConnectException
        at      at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:951)
        at      at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:133)
        at      at com.github.loki4j.client.http.JavaHttpClient.send(JavaHttpClient.java:68)
        at      at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendBatch(AsyncBufferPipeline.java:323)
        at      at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendStep(AsyncBufferPipeline.java:294)
        at      at com.github.loki4j.client.pipeline.AsyncBufferPipeline.runSendLoop(AsyncBufferPipeline.java:225)
        at      at com.github.loki4j.client.pipeline.AsyncBufferPipeline.lambda$start$3(AsyncBufferPipeline.java:132)
        at      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at      at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.net.ConnectException
        at      at java.net.http/jdk.internal.net.http.common.Utils.toConnectException(Utils.java:1028)
        at      at java.net.http/jdk.internal.net.http.PlainHttpConnection.connectAsync(PlainHttpConnection.java:227)
        at      at java.net.http/jdk.internal.net.http.PlainHttpConnection.checkRetryConnect(PlainHttpConnection.java:280)
        at      at java.net.http/jdk.internal.net.http.PlainHttpConnection.lambda$connectAsync$2(PlainHttpConnection.java:238)
        at      at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
        at      at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
        at      at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at      at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1773)
        at      ... 3 common frames omitted
Caused by: java.nio.channels.ClosedChannelException
        at      at java.base/sun.nio.ch.SocketChannelImpl.ensureOpen(SocketChannelImpl.java:202)
        at      at java.base/sun.nio.ch.SocketChannelImpl.beginConnect(SocketChannelImpl.java:786)
        at      at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:874)
        at      at java.net.http/jdk.internal.net.http.PlainHttpConnection.lambda$connectAsync$1(PlainHttpConnection.java:210)
        at      at java.base/java.security.AccessController.doPrivileged(AccessController.java:571)
        at      at java.net.http/jdk.internal.net.http.PlainHttpConnection.connectAsync(PlainHttpConnection.java:212)
        at      ... 9 common frames omitted

Please make sure you don't use any proxies or anything that listens http://127.0.0.1:18092 when Loki is off.

VenkateswaranJ commented 3 weeks ago

@nehaev you are right, I have a "nomad consul connect" proxy in between.

I tried configuring the proxy to wait until Loki is up and running, but it only throws an IOException instead of a ConnectException. I might need to create a fork that includes a retry mechanism for handling IOExceptions.

Do you see any other potential problem with retrying on "IOExceptions" instead of "ConnectException"?

Please feel free to close this issue.

nehaev commented 3 weeks ago

Do you see any other potential problem with retrying on "IOExceptions" instead of "ConnectException"?

Yes, I try to be as specific as possible for detecting a legit retry situation. Having broader conditions there (e.g., any IOException or any 5xx status) can potentially hide some real configuration or networking issues and compromise the performance.