reactor / reactor-netty

TCP/HTTP/UDP/QUIC client/server with Reactor over Netty
https://projectreactor.io
Apache License 2.0
2.6k stars 645 forks source link

java.io.IOException: Connection reset by peer #388

Closed picaso closed 6 years ago

picaso commented 6 years ago

Actual behavior

@Bean
fun webClient(): WebClient {
    return WebClient
        .builder()
        .baseUrl(someUrl)
        .filter(logResponseStatus())
        .build()
}
@RestController
@RequestMapping("v1")
class OrderController(private val orderService: OrderService) {

    @GetMapping("/orders/{storeId}")
    fun orders(@RequestHeader("Authorization") auth: String,
        @PathVariable("storeId") storeId: String) = mono(Unconfined) {
        orderService.orders(storeId, auth).unpackResponse()
    }
 //Repostiory class
 suspend fun deliveries(storeId: String, auth: String): List<ActiveOrdersRequestResponse>? {
        return webclient
            .get()
            .uri("v1/stores/$storeId/deliveries/$queryString")
            .header("Authorization", auth)
            .retrieve()
            .onStatus({ it == HttpStatus.FORBIDDEN || it == HttpStatus.UNAUTHORIZED }, { Mono.error(ValidationException(ErrorCode.AUTHENTICATION_ERROR, it.statusCode())) })
            .onStatus({ it == HttpStatus.NOT_FOUND }, { Mono.error(ValidationException(ErrorCode.STORE_NOT_FOUND, it.statusCode())) })
            .onStatus({ it.is5xxServerError }, { Mono.error(ValidationException(ErrorCode.AUTHENTICATION_EXCEPTION, it.statusCode())) })
            .bodyToFlux<ActiveOrdersRequestResponse>(ActiveOrdersRequestResponse::class.java)
            .collectList()
            .awaitSingle() //to prevent blocking the main thread
    }

For some reason executing this code throws randomly a

java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(FileDispatcherImpl.java)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1108)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:345)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
    at java.lang.Thread.run(Thread.java:748)

And I get a 500. This seems to happen mostly on my docker container than my main machine. I have tried numerous times to replicate it but no dice. There is no pattern whatsoever.

Load test result bombading my local server:

================================================================================
---- Global Information --------------------------------------------------------
> request count                                       1000 (OK=1000   KO=0     )
> min response time                                    400 (OK=400    KO=-     )
> max response time                                   1707 (OK=1707   KO=-     )
> mean response time                                   824 (OK=824    KO=-     )
> std deviation                                        221 (OK=221    KO=-     )
> response time 50th percentile                        802 (OK=802    KO=-     )
> response time 75th percentile                        975 (OK=975    KO=-     )
> response time 95th percentile                       1219 (OK=1219   KO=-     )
> response time 99th percentile                       1384 (OK=1384   KO=-     )
> mean requests/sec                                333.333 (OK=333.333 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                           496 ( 50%)
> 800 ms < t < 1200 ms                                 446 ( 45%)
> t > 1200 ms                                           58 (  6%)
> failed                                                 0 (  0%)
================================================================================

Result pointing to a remote server on 5 machines with 4gb and 2.5 VCPU

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      10000 (OK=9965   KO=35    )
> min response time                                    105 (OK=1041   KO=105   )
> max response time                                  31431 (OK=31431  KO=28589 )
> mean response time                                  5610 (OK=5618   KO=3425  )
> std deviation                                       3475 (OK=3438   KO=8984  )
> response time 50th percentile                       5452 (OK=5463   KO=198   )
> response time 75th percentile                       8189 (OK=8194   KO=241   )
> response time 95th percentile                       9772 (OK=9769   KO=28406 )
> response time 99th percentile                      10771 (OK=10740  KO=28527 )
> mean requests/sec                                    250 (OK=249.125 KO=0.875 )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                             0 (  0%)
> 800 ms < t < 1200 ms                                  83 (  1%)
> t > 1200 ms                                         9882 ( 99%)
> failed                                                35 (  0%)
---- Errors --------------------------------------------------------------------
> status.find.in(200,304,201,202,203,204,205,206,207,208,209), b     35 (100.0%)
ut actually found 500
================================================================================

Steps to reproduce

Random

Reactor Netty version

io.projectreactor.ipc:reactor-netty:0.7.5.RELEASE

JVM version (e.g. java -version)

openjdk version "1.8.0_171" OpenJDK Runtime Environment (IcedTea 3.8.0) (Alpine 8.171.11-r0) OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

OS version (e.g. uname -a)

Linux machine-name #1 SMP Sun Mar 11 19:39:47 UTC 2018 x86_64 Linux

violetagg commented 5 years ago

@kevin70 What's the version for Reactor Netty?

GSuaki commented 5 years ago

Issue disappeared after I added keepAlive(false) to HttpClient config:

HttpClient.create()
        .baseUrl("http://localhost:" + port)
        .keepAlive(false)     <---------------
        .headers(headers -> headers.add(HEADER_CONTENT_TYPE, CONTENT_TYPE_APPLICATION_JSON))

Inspired by groups.google.com/forum/#!topic/vertx/3o_DEwIK9dY

I was facing the StackTrace below and the tip from @YuryYaroshevich to set keepAlive(false) on HttpClient worked for me too. Thanks a lot!!

2019-04-16 16:03:25.235 ERROR [App,,,] 10 --- [or-http-epoll-1] reactor.core.publisher.Operators         : Operator called default onErrorDropped
    reactor.core.Exceptions$BubblingException: reactor.netty.http.client.PrematureCloseException: Connection prematurely closed DURING response
    at reactor.core.Exceptions.bubble(Exceptions.java:154) ~[reactor-core-3.2.6.RELEASE.jar!/:3.2.6.RELEASE]
    at reactor.core.publisher.Operators.onErrorDropped(Operators.java:512) ~[reactor-core-3.2.6.RELEASE.jar!/:3.2.6.RELEASE]
    at reactor.netty.channel.FluxReceive.onInboundError(FluxReceive.java:343) ~[reactor-netty-0.8.5.RELEASE.jar!/:0.8.5.RELEASE]
    at reactor.netty.channel.ChannelOperations.onInboundError(ChannelOperations.java:398) ~[reactor-netty-0.8.5.RELEASE.jar!/:0.8.5.RELEASE]
    at reactor.netty.http.client.HttpClientOperations.onInboundClose(HttpClientOperations.java:258) ~[reactor-netty-0.8.5.RELEASE.jar!/:0.8.5.RELEASE]
    at reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:121) ~[reactor-netty-0.8.5.RELEASE.jar!/:0.8.5.RELEASE]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) [netty-handler-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390) [netty-codec-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355) [netty-codec-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282) [netty-codec-http-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826) [netty-transport-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [netty-common-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) [netty-common-4.1.33.Final.jar!/:4.1.33.Final]
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:333) [netty-transport-native-epoll-4.1.33.Final-linux-x86_64.jar!/:4.1.33.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905) [netty-common-4.1.33.Final.jar!/:4.1.33.Final]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_144]
Caused by: reactor.netty.http.client.PrematureCloseException: Connection prematurely closed DURING response

2019-04-16 13:03:25.000 2019-04-16 16:03:25.236  WARN [App,,,] 10 --- [or-http-epoll-1] i.n.c.AbstractChannelHandlerContext      : An exception 'reactor.core.Exceptions$BubblingException: reactor.netty.http.client.PrematureCloseException: Connection prematurely closed DURING response' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
2019-04-16 13:03:25.000 reactor.core.Exceptions$BubblingException: reactor.netty.http.client.PrematureCloseException: Connection prematurely closed DURING response
    at reactor.core.Exceptions.bubble(Exceptions.java:154) ~[reactor-core-3.2.6.RELEASE.jar!/:3.2.6.RELEASE]
    at reactor.core.publisher.Operators.onErrorDropped(Operators.java:512) ~[reactor-core-3.2.6.RELEASE.jar!/:3.2.6.RELEASE]
    at reactor.netty.channel.FluxReceive.onInboundError(FluxReceive.java:343) ~[reactor-netty-0.8.5.RELEASE.jar!/:0.8.5.RELEASE]
    at reactor.netty.channel.ChannelOperations.onInboundError(ChannelOperations.java:398) ~[reactor-netty-0.8.5.RELEASE.jar!/:0.8.5.RELEASE]
    at reactor.netty.http.client.HttpClientOperations.onInboundClose(HttpClientOperations.java:258) ~[reactor-netty-0.8.5.RELEASE.jar!/:0.8.5.RELEASE]
    at reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:121) ~[reactor-netty-0.8.5.RELEASE.jar!/:0.8.5.RELEASE]

Spring boot version: 2.1.3.RELEASE

My configuration of WebClient now is:

@Bean
  fun webClientFactory(strategies: ExchangeStrategies): WebClient =
    WebClient.builder()
      .clientConnector(ReactorClientHttpConnector(HttpClient.from(tcpClient()).keepAlive(false)))
      .exchangeStrategies(strategies)
      .build()

  @Bean
  @DependsOn("objectMapper")
  fun exchangeStrategiesFactory(objectMapper: ObjectMapper): ExchangeStrategies =
    ExchangeStrategies.builder()
      .codecs(codecConfigurer(objectMapper))
      .build()

  private fun tcpClient() =
    TcpClient.create()
      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, DEFAULT_TIMEOUT)
      .doOnConnected { connection ->
        connection
          .addHandlerLast(ReadTimeoutHandler(DEFAULT_TIMEOUT.toLong(), TimeUnit.MILLISECONDS))
          .addHandlerLast(WriteTimeoutHandler(DEFAULT_TIMEOUT.toLong(), TimeUnit.MILLISECONDS))
      }
clavinovahan commented 4 years ago

While waiting a fix for this issue, I solved it this way :

  • For microservice with spring cloud gateway I used NIO instead of EPoll (by using builder.preferNative(false)) and I used reactor-netty 0.7.9 RELEASE
  • For microservice with spring webflux I used Undertow instead of Netty by adding dependency "org.springframework.boot:spring-boot-starter-undertow" just after "org.springframework.boot:spring-boot-starter-webflux" dependency in my build.gradle

@bcoste, Our application is also Spring Cloud Gateway. The Spring Security filter runs on cloud gateway try to post to IDP and the netty client get connection reset by peer. How do you solve this issue? Could you please give more detail?

ankitjindalstanza commented 4 years ago

@jicui , You can try switching to nio and check if that works. That did not work me for me however. So I disabled the pool & the errors disappeared.

Cheers, Balajee

How Can I do this? Couldn't find an option?

    HttpClient httpClient = HttpClient.create();

    ReactorClientHttpConnector connector =
            new ReactorClientHttpConnector();

        Builder builder = WebClient.builder().clientConnector(connector);
        return builder.build();
violetagg commented 4 years ago

@jicui , You can try switching to nio and check if that works. That did not work me for me however. So I disabled the pool & the errors disappeared. Cheers, Balajee

How Can I do this? Couldn't find an option?

  HttpClient httpClient = HttpClient.create();

    ReactorClientHttpConnector connector =
            new ReactorClientHttpConnector();

        Builder builder = WebClient.builder().clientConnector(connector);
        return builder.build();

You can disable the pool like this:

HttpClient httpClient = HttpClient.newConnection();
ReactorClientHttpConnector connector = new ReactorClientHttpConnector(httpClient);
Builder builder = WebClient.builder().clientConnector(connector);
return builder.build();

However I do not recommend disabling the pool nor setting keepAlive to false, except you have very good reason to do that.

ankitjindalstanza commented 4 years ago

But the issue is I am getting below error intermittently non my prod environment interestingly working fine for beta:

[2020-10-06 11:38:52.666] [reactor-http-epoll-2] ERROR [] [] [] [] [Mono.FlatMap.41:314] - | onError(io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer) [2020-10-06 11:38:52.667] [reactor-http-epoll-2] ERROR [] [] [] [] [Mono.FlatMap.41:319] - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: with setting keep alive.

violetagg commented 4 years ago

@adammichalik Create a new issue related to your use case, the Reactor Netty version that is in use etc.

adammichalik commented 4 years ago

@adammichalik Create a new issue related to your use case, the Reactor Netty version that is in use etc.

I believe you meant @ankitjindalstanza ;)

violetagg commented 4 years ago

@adammichalik excuse me

ankitjindalstanza commented 4 years ago

@violetagg shall I Open a different ticket?

violetagg commented 4 years ago

@ankitjindalstanza yes

hero6-coder commented 4 years ago

Hi, who also meet this situation? I have struggled with it more than 1 day and finally I found that grpc library caused the conflict with reactor netty (both 2 framework are in io package). So if you find the grpc in the pom, consider using it beside reactor netty

zipper01 commented 3 years ago

Hi, my question is how to catch this? we cannot control the client behavor (e.g. power outrage), but I need to release session resources at the server side when this prompts out. It does not goes to the pipeline, I don't know how to catch it. My Netty version: 4.1.58.Final.

rajeevprasanna commented 2 years ago

i am getting this issue in production. is there any proper fix without disabling connection pool?

chris-fung commented 2 years ago

Still facing the same problem here. Does anyone fix this?

DonCorleone92 commented 2 years ago

Hi, Can someone suggest what happened to the original solution against which this bug was created? I am still getting the connection reset by a peer from the client-side when using spring boot started 2.0.3.RELEASE and netty 4.1.25.Final. Did upgrading the dependencies resolve this specific issue?

chris-fung commented 2 years ago

I found this java code elsewhere and fixed my problem:

public static WebClient webClient() {
        return WebClient.builder()
                .clientConnector(new ReactorClientHttpConnector(HttpClient.create(provider)))
                .build();
    }

    private static final ConnectionProvider provider = ConnectionProvider.builder("fixed")
            .maxConnections(500)
            .maxIdleTime(Duration.ofSeconds(20))
            .maxLifeTime(Duration.ofSeconds(60))
            .pendingAcquireTimeout(Duration.ofSeconds(60))
            .evictInBackground(Duration.ofSeconds(120)).build();
violetagg commented 2 years ago

For connection close issues these links might help: https://projectreactor.io/docs/netty/release/reference/index.html#faq.connection-closed https://projectreactor.io/docs/netty/release/reference/index.html#_timeout_configuration

dharezlak commented 2 years ago

Hi @violetagg. We are facing the same issue with the WebClient randomly throwing the "Connection reset by peer" error and we discovered a lot of reports about it. We are still trying to figure out what is causing this, however, from the posts it seems that switching to RestTemplate fixes the issue. Since you show up many times in those reported issues do you know if anyone investigated what is RestTemplate doing differently that it does not suffer from this problem?

simonbasle commented 2 years ago

@dharezlak that is pretty simple: RestTemplate doesn't use Reactor-Netty unlike WebClient. Also, the reason why the above linked documentation has been created is precisely because there were so many reports of this exception, which can have multiple root causes that only you, the user, can analyze. This was proving to be a pain to triage and always hunt for more information, hence the documentation.

dharezlak commented 2 years ago

Thanks @simonbasle. So what does RestTemplate do differently from WebClient? Is there a way to achieve the same approach to establishing TCP connections with reactor-netty as with the client RestTemplate is using? We tried with newConnection without success. We thought, putting the performance concerns aside for now, that it will at least fix random connection resets but it has not. I also found this entry stating that A "side effect" of using Netty is that you need to handle stuff you never thought about, like sockets closing and connection resets.. Are those side effects documented somewhere so that we can understand better what we are dealing with? Analyzing low-level TCP dumps gets us nowhere for now.