newrelic / newrelic-telemetry-sdk-java

Java library for sending telemetry data to New Relic
Apache License 2.0
41 stars 37 forks source link

Java11HttpPoster does not silently retry network errors like OkHttpParser #275

Closed brettkail-wk closed 2 years ago

brettkail-wk commented 3 years ago

By default, the OkHttp client silently retries most network I/O errors, and OkHttpPoster does not modify this behavior, so errors are retried without throwing RetryWithBackoffException or similar. The java.net.http client does not, so BatchDataSender logs a warning and then wraps the network I/O error in RetryWithBackoffException, so the two posters cause different observable behavior (much noisier logging from Java11HttpParser). Empirically, the "Connection reset by peer" entwork errors happen quite frequently, so it would be desirable to have consistent behavior for at least that exception between the two posters.

Stack trace
com.newrelic.telemetry.exceptions.RetryWithBackoffException: IOException (message: HTTP/1.1 header parser received no bytes) while trying to send data to New Relic. SpanBatch retry recommended
    at com.newrelic.telemetry.transport.BatchDataSender.sendPayload(BatchDataSender.java:202)
    at com.newrelic.telemetry.transport.BatchDataSender.send(BatchDataSender.java:114)
    at com.newrelic.telemetry.spans.SpanBatchSender.sendBatch(SpanBatchSender.java:65)
    <redacted>
Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes
    at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:565)
    at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:119)
    at com.newrelic.telemetry.Java11HttpPoster.post(Java11HttpPoster.java:58)
    at com.newrelic.telemetry.transport.BatchDataSender.sendPayload(BatchDataSender.java:155)
    ... 11 common frames omitted
Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes
    at java.net.http/jdk.internal.net.http.common.Utils.wrapWithExtraDetail(Utils.java:303)
    at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:665)
    at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.checkForErrors(Http1AsyncReceiver.java:297)
    at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:263)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SynchronizedRestartableTask.run(SequentialScheduler.java:175)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:147)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
    ... 3 common frames omitted
Caused by: java.io.IOException: Connection reset by peer
    at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
    at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:233)
    at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223)
    at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:356)
    at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1153)
    at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:821)
    at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:175)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224)
    at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:763)
    at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:941)
    at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:245)
    at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:957)
    at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:912)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
    at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:912)

It would also be useful of BatchDataSender did not log a warning in addition to rethrowing an exception since that prevents our code from handling the exception and silently retrying, so perhaps that could be downgraded to debug/trace.

kford-newrelic commented 2 years ago

@brettkail-wk thanks for the suggested enhancement! Unfortunately, we have limited engineering capacity and this wasn't selected for our roadmap. If you or any member of our community is interested in submitting a PR for this, we'd be happy to review it!