Closed picaso closed 5 years ago
@picaso Please try with the latest Reactor Netty version. We have a lot of fixes since 0.7.5.
Also Connection reset by peer
means that the other party closed the connection.
I Updated to reactor-netty version to the latest version in spring boot 2.0.3-RELEASE. The error message is just different now but the same Connection reset by peer
error.
I was going to investigate further to see what was happening, but the team decided to use RestTemplate and though it was slower under load, no connection was reset. I am assuming something is wrong with reactor Webclient
and handling slow connections. I will investigate further and check with Spring Boot.
@picaso Do you have any updated on this one? Thanks.
Even I find similar issues. And I'm on the latest version of spring boot and reactor-netty - 0.7.8. Any updates on this? What is the root cause of these issues?
I experienced the reset connections about 50 % of times when on Spring Boot 2.0.1 / Reactor-Netty 0.7.6. I upgraded to the latest Spring Boot 2.0.4 / Reactor-Netty 0.7.8 and haven't seen a single connection reset yet. I did not perform any load tests, though, just manual tests.
@adammichalik /Team, Please keep the issue open. I'm using Spring boot version 2.0.3 with reactor-netty 0.7.8 but I still face this issue in production/live I'll check again by upgrading and performing some load tests and revert soon.
Can I please get a little more details on why this issue was happening in the older versions? Since I do not know the root cause, I experience the issue intermittently and my tests may not capture the scenario. If the team can help me with the root cause, I can reproduce the scenario, upgrade and confirm everything is working.
PS : I'm running the application inside a docker.
Hi Team,
I did a performance test and still intermittently get the connection reset on reactor-netty 0.7.8. Would moving to spring boot 2.0.4 help? Can I know what the root cause is? I can try to contribute some fix if I'm able to reproduce the same.
Hi Team,
I still face this issue. Any pointers to the real problem would be helpful. Can someone help me with this?
Hi,
Are you able to provide some reproducible scenario that we can use?
Regards, Violeta
Hi Violeta,
I'm running a Spring Boot 2.0.3 web application with Servlet Container (spring web starter with tomcat). On receiving a request, the application makes a "POST" call to another service with a simple JSON body. The call is made using Reactor web client as velow
public <T extends Object> Mono<ResponseEntity<String>> apiCall(URI uri, HttpHeaders headers, T body) {
RequestBodySpec request =
webClient.method(HttpMethod.POST).uri(uri).headers(h -> updateHeaders(headers, h));
if (body != null) {
request.body(BodyInserters.fromObject(body));
}
return request
.exchange()
.flatMap(r -> r.toEntity(String.class))
.publishOn(Schedulers.parallel());
}
private void updateHeaders(HttpHeaders headers, HttpHeaders h) {
if (MapUtils.isNotEmpty(headers)) {
h.addAll(headers);
}
}
Intermittently, this request keeps throwing a connection reset by client error as below:
Suppressed: io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer at io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown Source)
io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown Source)
I'm unable to recreate the error when I ran the application natively on my VM (but I'm not sure this issue will never happen on a VM). I'm basically unable to zero in on the root cause
Regards, Balajee
Is it related to this bug? (https://github.com/netty/netty/issues/3539) It seems to be closed I'm on 4.1.25 version of netty
Can you try to reproduce with Reactor Netty 0.7.9.BUILD-SNAPSHOT - it uses Netty 4.1.28.Final, or just change Netty version to 4.1.28.Final?
Will do it and revert shortly!
I've updated the spring boot version 2.0.4.RELEASE and the Netty version to 4.1.28.Final. Keeping an eye out for this issue. will keep you posted. Can I however know what is the root cause? This is a production impact, so if anyone can point me to the commit that fixed this, it would help. Without the RCA, I wouldn't be able to take this change to production.
Thanks, Balajee
Further to updating the versions, the frequency of the resets have reduced. Will revert after more checks
@violetagg ,
Since you suggested to switch to nio transport. I'm using the below to create a webclient. I'm forcing the epoll mechanism to level trigger (which is the default epoll mechanism in java nio). Is this what you meant by change to nio?
ExchangeStrategies strategies =
ExchangeStrategies.builder()
.codecs(
c -> {
c.customCodecs().decoder(new Jackson2JsonDecoder(objectMapper));
c.customCodecs().encoder(new Jackson2JsonEncoder(objectMapper));
})
.build();
ReactorClientHttpConnector connector =
new ReactorClientHttpConnector(
options ->
options
.option(ChannelOption.SO_TIMEOUT, 3000)
.option(EpollChannelOption.EPOLL_MODE, EpollMode.LEVEL_TRIGGERED)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 3000));
Builder builder = WebClient.builder().exchangeStrategies(strategies).clientConnector(connector);
Webclient client = builder.build();
In options
part call preferNative(false)
/**
* Set the preferred native option. Determine if epoll/kqueue should be used if available.
*
* @param preferNative Should the connector prefer native (epoll/kqueue) if available
* @return {@code this}
*/
public final BUILDER preferNative(boolean preferNative) {
this.preferNative = preferNative;
return get();
}
Then when execute the scenario instead of reactor-http-client-epoll
as thread name you should see reactor-http-nio
@violetagg
I can now confirm that even after using the latest versions of spring boot and netty (4.1.28.RELEASE), the issue still exists and i still face connection reset issues.
Suppressed: io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown Source)
I'm now changing the calls to nio transport as suggested and trying the same. However, we still want to work in 'epoll' mode. I'll revert with the soak test and confirm the behavior.
Were there any changes made to the io.netty.channel.epoll package? I see a similar issue raised 3 years ago and the issue we are facing is eerily similar https://github.com/netty/netty/issues/3539
@violetagg
I've enabled a lot of debug logs to understand the rootcause. Connection reset by peer happens quite frequently with netty but the error is rarely bubbled up. However, everytime I see the error bubble up, I see the exception is always the below
USER_EVENT: SslCloseCompletionEvent(java.nio.channels.ClosedChannelException)
USER_EVENT: io.netty.channel.socket.ChannelInputShutdownReadComplete@7940e46a
Does that help, I'm trying to get more details from the prod environment
@violetagg
I can see, this is a sequence for failures.
Acquiring existing channel from pool: DefaultPromise@2db14843(success: [id: 0x3cbc9cc3, L:/1xx.1xx.0.10:37230 - R:api-xyz.com/3x.2xx.1xx.41:443]) SimpleChannelPool{activeConnections=0}
[id: 0x8abf8526, L:/1xx.1xx.0.10:39918 ! R:api-xyz.com/3x.2xx.1xx.41:443] USER_EVENT: io.netty.channel.socket.ChannelInputShutdownReadComplete@18a456f8
[id: 0x8abf8526, L:/1xx.1xx.0.10:39918 ! R:api-xyz.com/3x.2xx.1xx.41:443] USER_EVENT: SslCloseCompletionEvent(java.nio.channels.ClosedChannelException
Is it possible that unhealthy connections are kept in the pool I see some similar issues https://github.com/reactor/reactor-netty/issues/177 https://github.com/netty/netty/issues/7262
I've currently disabled the pool in the options part. Would there be any repercussions?
@violetagg , @smaldini , @simonbasle ,
I can confirm that by disabling the pool, the errors completely disappeared. I'm running some soak tests to confirm the same. But the latency increases drastically when the pool is disabled. Does that give some pointers? Just connecting the dots, I feel it possible that unhealthy connections are kept in the pool.
Can someone help me with this? We just moved away from Vertx and got into the reactor stack. This issue would impede us from going ahead with reactor stack in production. Thanks
Is disabling the pool the solution to this? Is anyone looking at it?
Regards, Balajee
@balajeetm Yes I'm looking at it.
I'm facing the same issue io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer at io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown Source)
in springboot-2.0.3-release or updates on this or I have to switch to NIO?
@jicui , You can try switching to nio and check if that works. That did not work me for me however. So I disabled the pool & the errors disappeared.
Cheers, Balajee
While waiting a fix for this issue, I solved it this way :
Can you try the latest Reactor Netty SNAPSHOT (0.7.10) and the latest Spring Framework SNAPSHOT?
Fixed with #483. If you still see the issue reopen this ticket.
@violetagg Does this mean, We don't need to disablePool or use "preferNative" anymore?
@balajeetm We think we were able to fix that issue. If you are able to test that version and provide a feedback it will be great. Thanks.
@violetagg Will do so and revert soon. Thanks a ton
I'm having this issue with io.projectreactor.netty:reactor-netty:0.8.2.RELEASE
.
java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:283)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:250)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:226)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:382)
at org.eclipse.jetty.io.ChannelEndPoint.fill(ChannelEndPoint.java:234)
at org.eclipse.jetty.io.NetworkTrafficSelectChannelEndPoint.fill(NetworkTrafficSelectChannelEndPoint.java:47)
at org.eclipse.jetty.server.HttpConnection.fillRequestBuffer(HttpConnection.java:331)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:243)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.base/java.lang.Thread.run(Thread.java:844)
I set up WireMock and do a bunch of requests with reactor.netty.http.client.HttpClient
to that WireMock instance and if number of tests higher than let's say 10 then two or three tests fail and logs contain the error above and:
java.nio.channels.ClosedChannelException: null
at org.eclipse.jetty.io.WriteFlusher.onClose(WriteFlusher.java:502)
at org.eclipse.jetty.io.AbstractEndPoint.onClose(AbstractEndPoint.java:353)
at org.eclipse.jetty.io.ChannelEndPoint.onClose(ChannelEndPoint.java:216)
at org.eclipse.jetty.io.NetworkTrafficSelectChannelEndPoint.onClose(NetworkTrafficSelectChannelEndPoint.java:98)
at org.eclipse.jetty.io.AbstractEndPoint.doOnClose(AbstractEndPoint.java:225)
at org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:192)
at org.eclipse.jetty.io.AbstractEndPoint.close(AbstractEndPoint.java:175)
at org.eclipse.jetty.io.AbstractConnection.close(AbstractConnection.java:248)
at org.eclipse.jetty.server.HttpChannelOverHttp.earlyEOF(HttpChannelOverHttp.java:234)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1551)
at org.eclipse.jetty.server.HttpConnection.parseRequestBuffer(HttpConnection.java:360)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:250)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.base/java.lang.Thread.run(Thread.java:844)
@YuryYaroshevich Please report this as a separate issue and provide more information (an example will be very helpful). I cannot see here any Reactor Netty
/Netty
stack.
Issue disappeared after I added keepAlive(false)
to HttpClient config:
HttpClient.create()
.baseUrl("http://localhost:" + port)
.keepAlive(false) <---------------
.headers(headers -> headers.add(HEADER_CONTENT_TYPE, CONTENT_TYPE_APPLICATION_JSON))
Inspired by https://groups.google.com/forum/#!topic/vertx/3o_DEwIK9dY
@YuryYaroshevich Even though it would be better to find the cause instead of disabling keepAlive
.
@violetagg what is the difference between
<dependency>
<groupId>io.projectreactor.netty</groupId>
<artifactId>reactor-netty</artifactId>
</dependency>
and
<dependency>
<groupId>io.projectreactor.ipc</groupId>
<artifactId>reactor-netty</artifactId>
</dependency>
Is IPC deprecated?
@balajeetm Reactor Netty 0.7.x has group Id io.projectreactor.ipc
, for Reactor Netty 0.8.x we changed the group Id to io.projectreactor.netty
@violetagg Thanks. I saw there are some changes to the ReactorClientHttpConnector constructor signatures in 0.8.x. Will move everything forward.
The issue seems to be still present in current versions. Using Spring Boot 2.1.0 -> reactor 3.2.2 -> reactor-netty 0.8.2 -> netty 4.1.29 we intermittently still get:
io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(..)
Unfortunately I couldn't try the workarounds (disabling the pool and switching to nio) as suggested in one of the previous comments by @violetagg since the API changed for the 0.8.2 version and preferNative(boolean preferNative)
seems not to exist anymore. Thus any hints on how to switch to nio and disable pooling using the 0.8.2 API would be appreciated.
@rreimann Please create a new issue with more details about your use case as Connection reset by peer
can be caused by different reasons.
@violetagg I see the same issue with 2.1.1.RELEASE when running as backend server from haproxy
2018-12-10 00:40:45,708 [ERROR] {reactor-http-epoll-1} reactor.netty.tcp.TcpServer - [id: 0xc97be89d, L:/xx.xxx.x.xxx:8080 - R:/96.117.7.140:37420] onUncaughtException(SimpleConnection{channel=[id: 0xc97be89d, L:/xx.xxx.x.xxx:8080 - R:/xx.xxx.x.xxx:37420]}) io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer at io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown Source)
Please let me know if you need more information?
@binishchandy Please create a new issue with a reproducible scenario
@jicui , You can try switching to nio and check if that works. That did not work me for me however. So I disabled the pool & the errors disappeared.
Cheers, Balajee
hello,how to disable the pool, can you show some code please? thanks
@jicui , You can try switching to nio and check if that works. That did not work me for me however. So I disabled the pool & the errors disappeared.
Cheers, Balajee
hello, how to disable the pool? can you show some code please? thanks
Issue disappeared after I added
keepAlive(false)
to HttpClient config:HttpClient.create() .baseUrl("http://localhost:" + port) .keepAlive(false) <--------------- .headers(headers -> headers.add(HEADER_CONTENT_TYPE, CONTENT_TYPE_APPLICATION_JSON))
Inspired by https://groups.google.com/forum/#!topic/vertx/3o_DEwIK9dY
hello, can you show the resloved detail code? that meaned it disabled the pool? thanks
hello @zhangxingping
I'm not sure regarding the pool but it resolved my error. Also I'm not sure what do you mean by asking resolved detail code
... When error happened, snippet provided above didn't has .keepAlive(false)
, so this was the only line which resolved the issue.
hello @zhangxingping I'm not sure regarding the pool but it resolved my error. Also I'm not sure what do you mean by asking
resolved detail code
... When error happened, snippet provided above didn't has.keepAlive(false)
, so this was the only line which resolved the issue.
thanks your replying, l mean your concrete code,I am programing in kotlin
l see your code,l dont konw where to add it
@zhangxingping If you are using reactor HttpClient for making requests to third-party services then just create it the way I do it. If you are not using reactor HttpClient then I don't think that this fix will somehow help you.
I'm facing the same issue.
java.lang.RuntimeException: io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer
at io.zhudy.duic.web.config.WebConfig$1.accept(WebConfig.kt:57)
at io.zhudy.duic.web.config.WebConfig$1.accept(WebConfig.kt:47)
at reactor.core.publisher.Operators.onErrorDropped(Operators.java:514)
at reactor.netty.channel.FluxReceive.onInboundError(FluxReceive.java:343)
at reactor.netty.channel.ChannelOperations.onInboundError(ChannelOperations.java:398)
at reactor.netty.channel.ChannelOperationsHandler.exceptionCaught(ChannelOperationsHandler.java:185)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:264)
at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:256)
at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:264)
at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:256)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireExceptionCaught(CombinedChannelDuplexHandler.java:426)
at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:87)
at io.netty.channel.CombinedChannelDuplexHandler$1.fireExceptionCaught(CombinedChannelDuplexHandler.java:147)
at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
at io.netty.channel.CombinedChannelDuplexHandler.exceptionCaught(CombinedChannelDuplexHandler.java:233)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:264)
at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:256)
at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1401)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:264)
at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:953)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.handleReadException(AbstractEpollStreamChannel.java:736)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:825)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:433)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:330)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(..)(Unknown Source)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.MonoCreate] :
reactor.core.publisher.Mono.create(Mono.java:183)
reactor.netty.http.client.HttpClientConnect$MonoHttpConnect.subscribe(HttpClientConnect.java:289)
Actual behavior
For some reason executing this code throws randomly a
And I get a 500. This seems to happen mostly on my docker container than my main machine. I have tried numerous times to replicate it but no dice. There is no pattern whatsoever.
Load test result bombading my local server:
Result pointing to a remote server on 5 machines with 4gb and 2.5 VCPU
Steps to reproduce
Random
Reactor Netty version
io.projectreactor.ipc:reactor-netty:0.7.5.RELEASE
JVM version (e.g.
java -version
)openjdk version "1.8.0_171" OpenJDK Runtime Environment (IcedTea 3.8.0) (Alpine 8.171.11-r0) OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)
OS version (e.g.
uname -a
)Linux machine-name #1 SMP Sun Mar 11 19:39:47 UTC 2018 x86_64 Linux