Closed VenkateswaranJ closed 3 weeks ago
Hi @VenkateswaranJ, thanks for reporting this!
Could you please specify the version of Loki4j and JDK that you use?
Hi @nehaev
Loki4J - 1.5.2 Java - 21 (I also tried with Java 17 but it has the same behaviour)
The bug fix for connection timeout appears similar, but I don't believe it will resolve this issue. https://github.com/loki4j/loki-logback-appender/issues/243
To reproduce this issue, please start the Logback appender without running a Loki instance.
I'm trying to reproduce this on the main branch, Temurin-21.0.4 and see only ConnectException
.
20:26:21,869 |-ERROR in com.github.loki4j.client.pipeline.AsyncBufferPipeline@247d8ae - Error while sending Batch #262b8945bf01 (47,084 bytes) to Loki (http://127.0.0.1:3100/loki/api/v1/push) java.net.ConnectException
at java.net.ConnectException
at at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:951)
at at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:133)
at at com.github.loki4j.client.http.JavaHttpClient.send(JavaHttpClient.java:68)
at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendBatch(AsyncBufferPipeline.java:323)
at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.sendStep(AsyncBufferPipeline.java:294)
at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.runSendLoop(AsyncBufferPipeline.java:225)
at at com.github.loki4j.client.pipeline.AsyncBufferPipeline.lambda$start$3(AsyncBufferPipeline.java:132)
at at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.net.ConnectException
at at java.net.http/jdk.internal.net.http.common.Utils.toConnectException(Utils.java:1028)
at at java.net.http/jdk.internal.net.http.PlainHttpConnection.connectAsync(PlainHttpConnection.java:227)
at at java.net.http/jdk.internal.net.http.PlainHttpConnection.checkRetryConnect(PlainHttpConnection.java:280)
at at java.net.http/jdk.internal.net.http.PlainHttpConnection.lambda$connectAsync$2(PlainHttpConnection.java:238)
at at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
at at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
at at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1773)
at ... 3 common frames omitted
Caused by: java.nio.channels.ClosedChannelException
at at java.base/sun.nio.ch.SocketChannelImpl.ensureOpen(SocketChannelImpl.java:202)
at at java.base/sun.nio.ch.SocketChannelImpl.beginConnect(SocketChannelImpl.java:786)
at at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:874)
at at java.net.http/jdk.internal.net.http.PlainHttpConnection.lambda$connectAsync$1(PlainHttpConnection.java:210)
at at java.base/java.security.AccessController.doPrivileged(AccessController.java:571)
at at java.net.http/jdk.internal.net.http.PlainHttpConnection.connectAsync(PlainHttpConnection.java:212)
at ... 9 common frames omitted
Please make sure you don't use any proxies or anything that listens http://127.0.0.1:18092 when Loki is off.
@nehaev you are right, I have a "nomad consul connect" proxy in between.
I tried configuring the proxy to wait until Loki is up and running, but it only throws an IOException instead of a ConnectException. I might need to create a fork that includes a retry mechanism for handling IOExceptions.
Do you see any other potential problem with retrying on "IOExceptions" instead of "ConnectException"?
Please feel free to close this issue.
Do you see any other potential problem with retrying on "IOExceptions" instead of "ConnectException"?
Yes, I try to be as specific as possible for detecting a legit retry situation. Having broader conditions there (e.g., any IOException or any 5xx status) can potentially hide some real configuration or networking issues and compromise the performance.
I have a microservice environment where the Loki container and the Java applications run in separate Docker containers, managed by Nomad. The application container usually starts before the Loki container and sends logs to Loki via the Loki4j appender. I expect the Loki4j appender to queue logs and retry when Loki is unreachable, which it does during a ConnectException. However, when there is no Loki container, the Loki4j Logback appender's Java client throws an IOException, and retries are skipped to avoid duplicating logs.
Please check the logs from the Loki4j appender
As you can see the Batch #1d7a1d29e813 & #1d811e3c450f are dropped instead of retry.
But when I change it from ConnectException to IOException, it starts queuing logs as expected. https://github.com/loki4j/loki-logback-appender/blob/3e528a09792d84a8d745c0d156d1b633dea26c02/loki-client/src/main/java/com/github/loki4j/client/pipeline/AsyncBufferPipeline.java#L374
@nehaev Should we consider changing the queuing mechanism for IOException instead of limiting it to ConnectException?
Or do you know a way to retain the logs in the sendQueue when the Loki instance is unavailable or unreachable?