Open axel22 opened 5 years ago
cc @Fithos
I wasn't able to reproduce this error on Linux and MacOS.
Could you please post more details on the configuration and installed OS of the machines where finagle-chirper
fails?
Unfortunately, neither was I on my machine. I am in contact with Tom Rodriguez who reported it, who sees it happening on some cluster machines (apparently it's an issue that happens both on HotSpot and on GraalVM). cc @tkrodriguez
I've really only seen this on a fairly large machine. It's an Oracle x5-2 with 72 cores running Oracle Linux Server release 6.8. i've also seen problems with running out of file descriptors though that was with an older version of this. The limit is 65536 so it would have to be leaking fds pretty badly for this to occur. Anything I can try to help with debugging?
The benchmark has a leaking file descriptor issue. Example exception dump:
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor3.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at io.netty.channel.ReflectiveChannelFactory.newChannel(ReflectiveChannelFactory.java:38)
... 172 more
Caused by: io.netty.channel.ChannelException: io.netty.channel.unix.Errors$NativeIoException: newSocketStream(..) failed: Too many open files
at io.netty.channel.unix.Socket.newSocketStream0(Socket.java:439)
at io.netty.channel.epoll.LinuxSocket.newSocketStream(LinuxSocket.java:184)
at io.netty.channel.epoll.EpollSocketChannel.
Seen more instances of this on regular machines (e.g. measly 8 core 64GB RAM). At the time of the crash, the benchmark has quite a lot of TCP sockets open. Sometimes, this makes the benchmark hang instead of crash, complicating measurements.
... also see constantly increasing number of threads (right now ~300 threads at iteration 400), if it helps, most thread instances are named UnboundedFuturePool-Something.
@Fithos Can you confirm the file descriptor issue, and thread growth?
It's somewhat surprising that the thread count seems to be converging.
But, also, it drops at the very end, and the tearDownAfterAll
is a likely reason, I think. Perhaps we should change the benchmark so that it kills the server instance completely between the iterations.
Maybe it's because master
isn't being properly closed here? https://github.com/renaissance-benchmarks/renaissance/blob/master/benchmarks/twitter-finagle/src/main/scala/org/renaissance/twitter/finagle/FinagleChirper.scala#L430-L433
I'm also not seeing when other master
is closed.
Clients in Finagle are designed to be long-lived and potentially shared across your application. They operate better and better the longer you keep them around (connections are materialized, a past performance about errors/latencies is recorded, etc).
Hi Vladimir! Thanks for the pointer!
So, essentially, this could be fixed by calling close
on the Service
object, correct?
That's right, @axel22. Just call .close()
if you don't need the client anymore. Creating a single client instance and sharing it in your benchmark would be even better workload as it's essentially what traditional Finagle applications do.
Quoting from #168, "However, I hit #106 a lot locally on my MacBook. So this issue was apparently not fully resolved (cc @Fithos )."
So, the issue is still there, and should be reopened?
Some users reported seeing this on some machines: