spring-cloud / spring-cloud-gateway

An API Gateway built on Spring Framework and Spring Boot providing routing and more.
http://cloud.spring.io
Apache License 2.0
4.54k stars 3.33k forks source link

Connection Closed Prematurely #473

Closed bcelenk closed 5 years ago

bcelenk commented 6 years ago

Hi,

I've come across with the Connection closed prematurely problem, reproducing is trivial. Below is the stack trace.

java.io.IOException: Connection closed prematurely
    at reactor.ipc.netty.http.client.HttpClientOperations.onInboundClose(HttpClientOperations.java:269) ~[reactor-netty-0.7.8.RELEASE.jar:0.7.8.RELEASE]
    at reactor.ipc.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:113) ~[reactor-netty-0.7.8.RELEASE.jar:0.7.8.RELEASE]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:377) ~[netty-codec-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342) ~[netty-codec-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282) ~[netty-codec-http-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:822) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute$$$capture(AbstractEventExecutor.java:163) ~[netty-common-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java) ~[netty-common-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) ~[netty-common-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:464) ~[netty-transport-4.1.27.Final.jar:4.1.27.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) ~[netty-common-4.1.27.Final.jar:4.1.27.Final]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_171]

I'm using raw cloud-gateway and webflux which are both produced by Spring Initializr/no modifications, just added a uuid to represent each user's session.

Load is generated using Gatling, 2000rps. Below is the gatling's scala file.


import io.gatling.core.Predef._
import io.gatling.http.Predef._
import io.gatling.http.request.builder.HttpRequestBuilder.toActionBuilder
import scala.concurrent.duration._
import java.util.UUID

class GatewayLoadTest extends Simulation {
  object UuidFeeder {
    val feeder = Iterator.continually(Map("uuid" -> java.util.UUID.randomUUID.toString()))
  }

  val theHttpProtocolBuilder = http
        .baseURL("http://localhost:8080")

  val theScenarioBuilder = scenario("sample")
  .feed(UuidFeeder.feeder)
  .forever(){
     exec(http("gateway")
      .get("/premature/${uuid}")
      .header("gatling-user-id", "${uuid}")
      .check(

                    status.find.in(200)))
    }

    setUp(
        theScenarioBuilder.inject(constantUsersPerSec(100) during (120 seconds))
    )
    .throttle(
            reachRps(2000) in (1 seconds),
            holdFor(120 seconds)
        )
      .protocols(theHttpProtocolBuilder)

}

After 11k requests, exception occurred. This number varies between tests.

Gateway/reactor-netty's context when exception is occurred: gateway-premature-exception

Origin is unaware about exception origin

Wireshark HTTP packages, user's first two request were delegated to origin, 3rd one wasn't forwarded. premature-wireshark

Last http package's details

Frame 275732: 210 bytes on wire (1680 bits), 210 bytes captured (1680 bits) on interface 0
    Interface id: 0 (lo0)
        Interface name: lo0
    Encapsulation type: NULL/Loopback (15)
    Arrival Time: Aug  2, 2018 16:43:41.578082000 +03
    [Time shift for this packet: 0.000000000 seconds]
    Epoch Time: 1533217421.578082000 seconds
    [Time delta from previous captured frame: 0.000009000 seconds]
    [Time delta from previous displayed frame: 0.345598000 seconds]
    [Time since reference or first frame: 233.942944000 seconds]
    Frame Number: 275732
    Frame Length: 210 bytes (1680 bits)
    Capture Length: 210 bytes (1680 bits)
    [Frame is marked: False]
    [Frame is ignored: False]
    [Protocols in frame: null:ip:tcp:http]
    [Coloring Rule Name: HTTP]
    [Coloring Rule String: http || tcp.port == 80 || http2]
Null/Loopback
    Family: IP (2)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
        0000 00.. = Differentiated Services Codepoint: Default (0)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 206
    Identification: 0x0000 (0)
    Flags: 0x02 (Don't Fragment)
        0... .... = Reserved bit: Not set
        .1.. .... = Don't fragment: Set
        ..0. .... = More fragments: Not set
    Fragment offset: 0
    Time to live: 64
    Protocol: TCP (6)
    Header checksum: 0x0000 [validation disabled]
    [Header checksum status: Unverified]
    Source: 127.0.0.1
    Destination: 127.0.0.1
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 57374, Dst Port: 8080, Seq: 309, Ack: 299, Len: 154
    Source Port: 57374
    Destination Port: 8080
    [Stream index: 6157]
    [TCP Segment Len: 154]
    Sequence number: 309    (relative sequence number)
    [Next sequence number: 463    (relative sequence number)]
    Acknowledgment number: 299    (relative ack number)
    1000 .... = Header Length: 32 bytes (8)
    Flags: 0x018 (PSH, ACK)
        000. .... .... = Reserved: Not set
        ...0 .... .... = Nonce: Not set
        .... 0... .... = Congestion Window Reduced (CWR): Not set
        .... .0.. .... = ECN-Echo: Not set
        .... ..0. .... = Urgent: Not set
        .... ...1 .... = Acknowledgment: Set
        .... .... 1... = Push: Set
        .... .... .0.. = Reset: Not set
        .... .... ..0. = Syn: Not set
        .... .... ...0 = Fin: Not set
        [TCP Flags: ·······AP···]
    Window size value: 12750
    [Calculated window size: 408000]
    [Window size scaling factor: 32]
    Checksum: 0xfec2 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
        TCP Option - No-Operation (NOP)
            Kind: No-Operation (1)
        TCP Option - No-Operation (NOP)
            Kind: No-Operation (1)
        TCP Option - Timestamps: TSval 458370685, TSecr 458370364
            Kind: Time Stamp Option (8)
            Length: 10
            Timestamp value: 458370685
            Timestamp echo reply: 458370364
    [SEQ/ACK analysis]
        [iRTT: 0.000137000 seconds]
        [Bytes in flight: 154]
        [Bytes sent since last PSH flag: 154]
    TCP payload (154 bytes)
Hypertext Transfer Protocol
    GET /premature/43d91f31-5fc7-4c65-81eb-84d9865c2ff1 HTTP/1.1\r\n
        [Expert Info (Chat/Sequence): GET /premature/43d91f31-5fc7-4c65-81eb-84d9865c2ff1 HTTP/1.1\r\n]
            [GET /premature/43d91f31-5fc7-4c65-81eb-84d9865c2ff1 HTTP/1.1\r\n]
            [Severity level: Chat]
            [Group: Sequence]
        Request Method: GET
        Request URI: /premature/43d91f31-5fc7-4c65-81eb-84d9865c2ff1
        Request Version: HTTP/1.1
    gatling-user-id: 43d91f31-5fc7-4c65-81eb-84d9865c2ff1\r\n
    Host: localhost:8080\r\n
    Accept: */*\r\n
    \r\n
    [Full request URI: http://localhost:8080/premature/43d91f31-5fc7-4c65-81eb-84d9865c2ff1]
    [HTTP request 3/3]
    [Prev request in frame: 270045]

Tcp stream of the last http package(produces using right-click>follow TCP stream): wireshark-as-tcp-stream

All apps are run on my personal macbook, same exception also occurs on EC2/Linux per gateway/origin with genuine clients.

Macbook specs: macOS High Sierra 10.13.6 (17G65) Processor 2.2 GHz Intel Core i7 Memory 16 GB 1600 MHz DDR3 Tried with 3 different jdks: jdk1.8.0_131 jdk1.8.0_171 jdk1.8.0_181

bcelenk commented 6 years ago

In this issue, netty debug logs are shared for this particular problem. Which clearly shows that the channel between gateway-origin is closed before the request[3rd one in my case] is delivered to the origin.

spencergibb commented 6 years ago

My guess is that it is a reactor-netty issue. @smaldini @violetagg

normanmaurer commented 6 years ago

Agree I dont think its a netty issue.

yesilalperen commented 6 years ago

We are experiencing the same issue. After setting http client pool type to fixed from elastic as the default, it seems we do not have any connection premature exceptions anymore. Any comments? @bcelenk @spencergibb

bcelenk commented 6 years ago

@yesilalperen @spencergibb @smaldini @violetagg @simonbasle According to my tests at lab environment I can confirm that setting the gateway's client pool setting from ELASTIC to FIXED solved the "Connection Closed Prematurely" error. Also https://github.com/spring-cloud/spring-cloud-gateway/issues/480, I don't see the "SSLEngine already closed" exception anymore. I think there is a severe bug in reactor-netty's https://projectreactor.io/docs/netty/release/api/reactor/ipc/netty/resources/PoolResources.html#elastic-java.lang.String-

SanCoder-Q commented 6 years ago

We are also having this issue. We use spring boot and WebFlux 5.0.8. We always get the IOException: Connection Closed Prematurely after 10 minutes without any access. Since we don't really want to use a fixed connection pool, the workaround we are using is simply adding a retry after having the issue. And we are waiting for the fix.

dave-fl commented 6 years ago

Do the connections in the invalid state ever disappear? If they don't, would the retry approach imply that the pool eventually becomes littered with bad state connections, forcing retries on N% of your requests?

Wouldn't a very large fixed pool be the same as elastic?

violetagg commented 6 years ago

Hi,

We think we have a fix for this issue. It is available for the moment in Reactor Netty 0.7.10.BUILD-SNAPSHOT/0.8.1.BUILD-SNAPSHOT. It will be great if you are able to test it and provide feedback.

Thanks, Violeta

cah-andrew-fitzgerald commented 6 years ago

We think we've been seeing some unfortunate interplay between this and cloudfoundry/gorouter#212.

It looks like failed requests to spring-cloud-gateway are triggering an event in the gorouter that is causing it to drop all connections, and causing our upstream AWS ALB to mark the gorouter instance as down, effectively bringing down our entire PCF instance.

spencergibb commented 6 years ago

@cah-andrewfitzgerald were you able to test with the snapshots @violetagg mentioned?

cah-andrew-fitzgerald commented 6 years ago

Not yet, need to dig in this week and set up a harness to replicate it (only seeing it intermittently).

cah-andrew-fitzgerald commented 6 years ago

@spencergibb another thing to note with this error: We were watching metrics closely for the gateway during our rollout, and it looked like everything was working fine as shown by the gateway.requests metrics.

I can more detailed issue later.

abhishekhp commented 6 years ago

We are experiencing the same issue. After setting http client pool type to fixed from elastic as the default, it seems we do not have any connection premature exceptions anymore. Any comments? @bcelenk @spencergibb

How do you change the http client pool type for netty started as part of spring boot.

violetagg commented 6 years ago

Hello,

We have additional fix available in Reactor Netty 0.7.11.RELEASE. Can you try that and provide a feedback?

Thanks, Violeta

zhangam commented 5 years ago

When my file-upload-service throw MaxUploadSizeExceededException,spring cloud gateway will throw java.io.IOException: Connection closed prematurely too!

violetagg commented 5 years ago

@zhangam Can you provide some simple example? Thanks a lot.

zhangam commented 5 years ago

@zhangam Can you provide some simple example? Thanks a lot.

To show this problem,I turn Spring Cloud Gateway Hystrix off first. qq20181129-222954 Then zsy-ufile-service microservices set:

spring:
  servlet:
    multipart:
      max-file-size: 20MB
      max-request-size: 20MB

Then i uploaded a file that exceeded the limit. This is my zsy-ufile-service exception:

org.springframework.web.multipart.MaxUploadSizeExceededException: Maximum upload size exceeded; nested exception is java.lang.IllegalStateException: org.apache.tomcat.util.http.fileupload.FileUploadBase$SizeLimitExceededException: the request was rejected because its size (53687355) exceeds the configured maximum (20971520) at......

And here's a picture of the Spring Cloud Gateway exception:

qq20181129-223459 2x
zhangam commented 5 years ago

@violetagg Are you looking at this question, please?

violetagg commented 5 years ago

@zhangam Thanks for the described scenario. In this scenario the Connection Closed Prematurely is something that is expected as the origin (Tomcat server) closes the connection on purpose (the file exceeds the limit) and that happens while you are sending the request body. However I extended the exception message so that is it clear the the remote party closed the connection while sending the request body. (https://github.com/reactor/reactor-netty/commit/b576b8761e61ebaaf0f885ff3088c9179ea2ac6e)

violetagg commented 5 years ago

The latest fix is available with 0.7.13.BUILD-SNAPSHOT/0.8.4.BUILD-SNAPSHOT All: Please try that versions and if the issue is gone I think we can close the ticket.

gxl191911 commented 5 years ago

@yesilalperen @spencergibb @smaldini @violetagg @simonbasle According to my tests at lab environment I can confirm that setting the gateway's client pool setting from ELASTIC to FIXED solved the "Connection Closed Prematurely" error. Also #480, I don't see the "SSLEngine already closed" exception anymore. I think there is a severe bug in reactor-netty's https://projectreactor.io/docs/netty/release/api/reactor/ipc/netty/resources/PoolResources.html#elastic-java.lang.String-

I set the gateway's client pool setting from ELASTIC to FIXED ,but can not solved the "Connection Closed Prematurely" error. That is way?

gxl191911 commented 5 years ago

The latest fix is available with 0.7.13.BUILD-SNAPSHOT/0.8.4.BUILD-SNAPSHOT All: Please try that versions and if the issue is gone I think we can close the ticket.

Can you tell me which version of spring cloud gaateway? Thanks

violetagg commented 5 years ago

@gxl191911 You can update Reactor Netty version in your application like this:

pom.xml Spring Gateway 2.1.0.x (current 2.1.0.RC3)

    <properties>
        <reactor-bom.version>Californium-SR4</reactor-bom.version>
    </properties>

Spring Gateway 2.0.x (current 2.0.2.RELEASE)

    <properties>
        <reactor-bom.version>Bismuth-SR15</reactor-bom.version>
    </properties>
spencergibb commented 5 years ago

With the updated versions of reactor-netty is this still an issue?

tony-clarke-amdocs commented 5 years ago

@spencergibb @violetagg Yes, we are still seeing this with Bismuth-SR15.

java.io.IOException: Connection closed prematurely
                at reactor.ipc.netty.http.client.HttpClientOperations.onInboundClose(HttpClientOperations.java:278)
                at reactor.ipc.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:115)
                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
                at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
                at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420)
                at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:377)
                at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342)
                at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282)
                at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223)
                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
                at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
                at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:377)
                at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342)
                at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1050)
                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
                at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
                at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429)
                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
                at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
                at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947)
                at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:822)
                at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
                at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
                at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:315)
                at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
                at java.lang.Thread.run(Thread.java:748)
spencergibb commented 5 years ago

@tony-clarke-amdocs can you open an issue in reactor netty?

haryhouqin commented 5 years ago

[2019-03-26 18:14:59.216] [reactor-http-epoll-8] [ERROR] [reactor.core.publisher.Operators] [Operator called default onErrorDropped] reactor.core.Exceptions$BubblingException: reactor.netty.http.client.PrematureCloseException: Connection prematurely closed DURING response at reactor.core.Exceptions.bubble(Exceptions.java:154) at reactor.core.publisher.Operators.onErrorDropped(Operators.java:512) at reactor.netty.channel.FluxReceive.onInboundError(FluxReceive.java:343) at reactor.netty.channel.ChannelOperations.onInboundError(ChannelOperations.java:398) at reactor.netty.http.client.HttpClientOperations.onInboundClose(HttpClientOperations.java:258) at reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:121) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420) at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390) at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355) at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282) at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:333) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905) at java.lang.Thread.run(Thread.java:748) Caused by: reactor.netty.http.client.PrematureCloseException: Connection prematurely closed DURING response [2019-03-26 18:14:59.217] [reactor-http-epoll-8] [ERROR] [reactor.core.publisher.Operators] [Operator called default onErrorDropped] reactor.core.Exceptions$BubblingException: reactor.netty.http.client.PrematureCloseException: Connection prematurely closed DURING response at reactor.core.Exceptions.bubble(Exceptions.java:154) at reactor.core.publisher.Operators.onErrorDropped(Operators.java:512) at reactor.netty.channel.FluxReceive.onInboundError(FluxReceive.java:343) at reactor.netty.channel.ChannelOperations.onInboundError(ChannelOperations.java:398) at reactor.netty.http.client.HttpClientOperations.onInboundClose(HttpClientOperations.java:258) at reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:121) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:420) at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:390) at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355) at io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:282) at io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:223) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1403) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:333) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905) at java.lang.Thread.run(Thread.java:748) Caused by: reactor.netty.http.client.PrematureCloseException: Connection prematurely closed DURING response

zhanglin12345 commented 5 years ago

We are also having this issue. We use spring boot and WebFlux 5.0.8. We always get the IOException: Connection Closed Prematurely after 10 minutes without any access. Since we don't really want to use a fixed connection pool, the workaround we are using is simply adding a retry after having the issue. And we are waiting for the fix.

We still seeing the same issue after idle for like 20 mins, for POST request the error is "Connection Closed Prematurely", for GET request there is no error but return empty response. Then we send a lot of request, the responses are back to normal in 5s. It seems like the connection was somehow broken while idle but spring cloud gateway doesn't know. When new requests come in, it drops the broken connection because of error, then establish the new connections to make it work. The reactor netty version is 0.8.9 release, the gateway version is 2.1.1 Release. @violetagg we saw you were working on the issue, is there any input on this one? Thank you very much.

violetagg commented 5 years ago

@zhanglin12345 Did you try to investigate why the other party closes the connection? Any tcp dump?

zhanglin12345 commented 5 years ago

@zhanglin12345 Did you try to investigate why the other party closes the connection? Any tcp dump?

@violetagg thanks for asking! We have a spring cloud gateway service, and a webflux down stream service called "order service". We ran the jar file of "gateway service" in one host, and the jar file of "order service" in another host, it works fine, no matter how much time we were waiting, and how many requests we sent.

Then we deployed them in docker containers (base image is "openjdk:11-jdk-slim"), first we sent a lot of requests, then idle for like 20 mins, then we got the error. We changed the tcp keep alive interval to 6s, and compared the tcp dump between "JAR only" and the docker container We found tcp keep alive package is missing in container, it is because the container doesn't inherit the network kernel parameters from the host. And we didn't see FIN in 20 mins at all, we got a lot of "RST" afterwards. Then we changed the keep alive interval in docker which is not recommended, but the response is so slow (5s) we are not able to reproduce the bug. Still investigating the problem, do you have any clue on this? Thanks very much!

jinkaz commented 4 years ago

i foud this error when i used springcloud Greenwich.SR3 . The scene is a global filter i builded to filter some illegal characters like this: @Override public Mono filter(ServerWebExchange exchange, GatewayFilterChain chain) { ServerHttpRequest request = exchange.getRequest(); URI uri = request.getURI(); String path = uri.getPath(); log.info("拦截的uri [{}]", path); ServerHttpResponse response = exchange.getResponse();

    ServerHttpRequestDecorator serverHttpRequestDecorator = new ServerHttpRequestDecorator(request) {
        @Override
        public Flux<DataBuffer> getBody() {
            Flux<DataBuffer> body = super.getBody();
            return body.map(dataBuffer -> {
                byte[] content = new byte[dataBuffer.readableByteCount()];
                dataBuffer.read(content);
                DataBufferUtils.release(dataBuffer);
                String bodyJson = new String(content, Charset.forName("UTF-8"));
                if (bodyJson.equals("")) {
                    errorResult(response);
                }
                JSONObject jsonObject = JSON.parseObject(bodyJson);
                String result = jsonObject.toJSONString();
                byte[] bytes = result.getBytes();

                NettyDataBufferFactory nettyDataBufferFactory = new NettyDataBufferFactory(ByteBufAllocator.DEFAULT);
                DataBuffer buffer = nettyDataBufferFactory.allocateBuffer(bytes.length);
                buffer.write(bytes);
                return buffer;
            });
        }

        @Override
        public HttpHeaders getHeaders() {
            HttpHeaders httpHeaders = new HttpHeaders();
            httpHeaders.putAll(super.getHeaders());
            return httpHeaders;
        }

    };
    return chain.filter(exchange.mutate().request(serverHttpRequestDecorator).build());
}

of course,above code is likely a demo i just write for testing ,and i just builded one service for testing. during testing at ,first processing succesful,and the second time the error appear, again requesting it success,and the four time the error happened again. How i can solved thie error .