[BUG] Getting java.io.IOException: HTTP/1.1 header parser received no bytes intermittently

daiviksakaria commented 6 months ago

Describe the bug

I am getting below error intermittently since last few months.

Using defaultKtorHttpClient code for initializing client
Using bulk call with failOnFirstError set to true

Also passed callBack function to above bulk call for handling failures

private val docCallBack = object : BulkItemCallBack {
 override fun itemFailed(
     operationType: OperationType,
     item: ItemDetails
 ) {
 }

 override fun itemOk(
     operationType: OperationType,
     item: ItemDetails
 ) {
 }

 override fun bulkRequestFailed(
     e: Exception,
     ops: List<Pair<String, String?>>
 ) {
     logger.error("Bulk indexing request failed for ${ops.size} docs due to: $e")
     throw Exception("Bulk indexing request failed for ${ops.size} docs due to: $e")
 }
}

Additionally, I am passing bulkSize (set to 15000), refresh, timeout (120 secs) as params to bulk call.

I have received below error even when the total docs in bulk call are around 252.

java.lang.Exception: Bulk indexing request failed for 15000 docs due to: java.io.IOException: HTTP/1.1 header parser received no bytes
    at autumn.pipeline.featureloader.lib.es.ESIndexService$docCallBack$1.bulkRequestFailed(ESIndexService.kt:73)
    at com.jillesvangurp.ktsearch.BulkSession.flush(bulk-api.kt:308)
    at com.jillesvangurp.ktsearch.BulkSession$flush$1.invokeSuspend(bulk-api.kt)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
    at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
    Suppressed: java.lang.Exception: Bulk indexing request failed for 15000 docs due to: java.io.IOException: HTTP/1.1 header parser received no bytes
        ... 11 more
    Suppressed: java.lang.Exception: Bulk indexing request failed for 15000 docs due to: java.io.IOException: HTTP/1.1 header parser received no bytes
        ... 11 more
        Suppressed: java.lang.Exception: Bulk indexing request failed for 15000 docs due to: java.lang.Exception: Bulk indexing request failed for 15000 docs due to: java.io.IOException: HTTP/1.1 header parser received no bytes
            ... 11 more
        Suppressed: java.lang.Exception: Bulk indexing request failed for 15000 docs due to: java.util.concurrent.CancellationException: Parent job is Cancelling
            ... 11 more

To Reproduce

No specific steps for it as few bulk calls fails intermittently without any change in code/config.

jillesvangurp commented 6 months ago

I haven't seen this myself but sounds like a nasty issue. Using the callback handler to retry the request could be possible workaround but this should not be happening.

I Googled the error; you might be running into this: https://bugs.openjdk.org/browse/JDK-8299018, in which case try to update your java version.

Let me know if this works out for you.

You might also want to look at the http client configuration. I've had issues with ktor's CIO client at some point (different). I swapped it out for the Java http client at some point. There are several alternatives that you can try.

15000 documents sounds like a lot btw. You might want to tune that.

daiviksakaria commented 6 months ago

I did come across https://bugs.openjdk.org/browse/JDK-8299018 when I was looking for the error. The error here is slightly different as it mentions Proxy connection getting closed. In any case, I am using jdk 17 version so this fix should be present already.

I faced this error when bulk request had 252 documents, so not sure if it's related to bulk size. Also, I am using the ktor's default Java client (not CIO)

jillesvangurp commented 6 months ago

The issue is for the Java client that you are using and if you are using an older version of Java 17, you might still be affected by it. As mentioned, I'm not experiencing this issue with either Java 17 or Java 21 while also using the Java client.

The issue is about the server disconnecting and the error you see is related to that happening. There can be all sorts of reasons for servers closing the connection and dealing with that client side is probably where the issue is.

Since you mentioned custom timeouts, you might look at are your timeouts and making sure that what you do client side lines up. I've had issues with this before where we had a server disconnecting unused connections because the client side timeout was set beyond the server side timeout. So, once in a while the client would use a pooled connection that was already closed server side and then hit an exception. If that's the case, the fix would be lowering your timeouts. (or increasing them on the elasticsearch side somehow).

Alternatively, you could try the CIO client but it has another issue as I mentioned. There is also support for apache httpclient that you could try. So you have some options.

In any case, I can't actually do anything about this without more information. Can you provide exact versions of what you are using here (jdk, OS, Elasticsearch, etc.)?

daiviksakaria commented 6 months ago

MacOS 13.6.2

java --version

openjdk 17 2021-09-14
OpenJDK Runtime Environment Temurin-17+35 (build 17+35)
OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode, sharing)

ElasticSearch 8.5.2

I have updated my code to use Java API client instead of kt-search to avoid getting this issue.

jillesvangurp / kt-search

[BUG] Getting java.io.IOException: HTTP/1.1 header parser received no bytes intermittently #96