spring-projects / spring-boot

Spring Boot
Apache License 2.0
74.55k stars 40.54k forks source link

Too many open files after upgrade to Spring Boot 2.2.8 #21923

Closed martinvisser closed 4 years ago

martinvisser commented 4 years ago

We recently upgraded from Spring Boot 2.2.7 to 2.2.8, running on PCF (Azure, OS is linux). Now we just ran into an issue where the app crashed in the end with "Too many open files". "files" actually were open TCP sockets, over 1 million. As there are quite some other dependency upgrades, it's very hard to figure out where it goes wrong. We actually had 32 instances crashing and reproduced it in another environment pretty easily. After a couple of hours the number of open sockets didn't change.

The application uses webflux, so netty. To see if it was about netty I downgraded Spring Boot to 2.2.7 and only updated all netty dependencies to 4.1.50. With that configuration it worked fine, the amount of sockets stayed around 30.000.

I can't reproduce this on my Mac, but with some load easily on PCF on Linux. So I think it's related to the OS.

Some stack traces:

io.netty.channel.DefaultChannelPipeline  : An exceptionCaught() event was fired, and it reached at the tail of the pipeline.
It usually means the last handler in the pipeline did not handle the exception. io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files

sun.rmi.transport.tcp                    : RMI TCP Accept-5000: accept loop for ServerSocket[addr=,localport=5000] throws java.net.SocketException: Too many open files (Accept failed)

   at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)

    at java.base/java.net.AbstractPlainSocketImpl.accept(Unknown Source)

    at java.base/java.net.ServerSocket.implAccept(Unknown Source)

    at java.base/java.net.ServerSocket.accept(Unknown Source)

    at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(Unknown Source)

    at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(Unknown Source)
  at java.base/java.lang.Thread.run(Unknown Source)

a.w.r.e.AbstractErrorWebExceptionHandler : [4974671b-12062]  500 Server Error for HTTP POST "/some/path" io.netty.channel.unix.Errors$NativeIoException: newSocketStream(..) failed: Too many open files
Wrapped by: io.netty.channel.ChannelException: io.netty.channel.unix.Errors$NativeIoException: newSocketStream(..) failed: Too many open files
at io.netty.channel.unix.Socket.newSocketStream0(Socket.java:421)
at io.netty.channel.epoll.LinuxSocket.newSocketStream(LinuxSocket.java:319)
at io.netty.channel.epoll.LinuxSocket.newSocketStream(LinuxSocket.java:323)
at io.netty.channel.epoll.EpollSocketChannel.<init>(EpollSocketChannel.java:45)
at reactor.netty.resources.DefaultLoopEpoll.getChannel(DefaultLoopEpoll.java:45)
at reactor.netty.resources.LoopResources.onChannel(LoopResources.java:187)
at reactor.netty.resources.LoopResources.onChannel(LoopResources.java:169)
at reactor.netty.tcp.TcpResources.onChannel(TcpResources.java:215)
at reactor.netty.http.client.HttpClientConnect$HttpTcpClient.connect(HttpClientConnect.java:141)
at reactor.netty.tcp.TcpClientOperator.connect(TcpClientOperator.java:43)
Wrapped by: com.netflix.hystrix.exception.HystrixRuntimeException: payment-request-merchant-site.payment-request-merchant-site-v2 failed and fallback failed.
at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:832)

    |_ Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed 
at the following site(s):

    |_ |
    |_ checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.security.web.server.authorization.AuthorizationWebFilter [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.security.web.server.authorization.ExceptionTranslationWebFilter [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.security.web.server.authentication.logout.LogoutWebFilter [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.security.web.server.savedrequest.ServerRequestCacheWebFilter [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.security.web.server.context.SecurityContextServerWebExchangeWebFilter [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.security.web.server.context.ReactorContextWebFilter [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.security.config.web.server.ServerHttpSecurity$ServerWebExchangeReactorContextWebFilter [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.security.web.server.WebFilterChainProxy [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.security.web.server.WebFilterChainProxy [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.cloud.sleuth.instrument.web.TraceWebFilter [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]

    |_ |
    |_ checkpoint ⇢ HTTP POST "/some/path" [ExceptionHandlingWebHandler]
Stack trace:

at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:832)

at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:807)

at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140)

at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)

at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)

at com.netflix.hystrix.AbstractCommand$DeprecatedOnFallbackHookApplication$1.onError(AbstractCommand.java:1472)

at com.netflix.hystrix.AbstractCommand$FallbackHookApplication$1.onError(AbstractCommand.java:1397)

at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)

at rx.internal.reactivestreams.SubscriberAdapter.onError(SubscriberAdapter.java:59)

at reactor.core.publisher.StrictSubscriber.onError(StrictSubscriber.java:106)

bclozel commented 4 years ago

Since this seems to be linked with Netty's native support, did you try changing the "netty-tcnative-boringssl-static" dependency?

You could try the following:

Please let us know if this changes things, it would help us to find the source of the problem. Thanks!

bclozel commented 4 years ago

This seems to be caused by https://github.com/reactor/reactor-netty/issues/1152

Could you try overriding the reactor-netty dependency to the latest 0.9.9.BUILD-SNAPSHOT and this if this fixes the issue?


snicoll commented 4 years ago

You can override reactor-bom.version to Dysprosium-BUILD-SNAPSHOT. We've also switched Spring Boot 2.2.9.BUILD-SNAPSHOT to use this version by default so in an hour or so you could just switch your build to 2.2.9.BUILD-SNAPSHOT.

martinvisser commented 4 years ago

@bclozel Overriding the version to 0.9.9.BUILD-SNAPSHOT worked again as expected! No more excessive sockets opened.

bclozel commented 4 years ago

Thanks @martinvisser this is helping a lot!

MahatmaFatalError commented 4 years ago

reactor-netty is available in v0.9.10.RELEASE. Which spring boot release is planned to contain this version?

bclozel commented 4 years ago

Reactor Dysprosium-SR10 ships with reactor-netty 0.9.10, see #22376. As for this particular issue, Dysprosium-SR9 (reactor-netty 0.9.9) should already fix the problem in Spring Boot 2.3.2 (see #21938).

If you're still experiencing an issue, please create a ticket on the reactor-netty tracker with your findings.

violetagg commented 3 years ago

@jeehunseo Can you check that you do not have dependencies mismatch? Carefully check the dependencies that may pack Netty and especially the native parts of Netty.

snicoll commented 3 years ago

@violetagg Thanks. The reporter has now created a separate issue so I suggest we follow-up there.

zyy71897 commented 1 year ago

I still have this problem in version spring-boot 2.6.6