Open leachbj opened 9 years ago
This seems to be worse than I thought. We'd assumed that the "keepAlive" timeout of a thread would only occur if workQueue was empty. But what I think is happening is that the threads just always get shutdown after the deadline elapses regardless of whether there's pending work and then if there is pending work, a new thread gets kicked off to replace the old one. That's why we see the rapid recycling of threads every 500ms. A pool size of 0 still doesn't help us, because there's always a scheduled future task for heartbeating.
See ThreadPoolExecutor#processWorkerExit()
What about just removing the timeout altogether. I'll give this one a try today.
@leachbj - @prestona kindly took a look at this earlier and has pushed a potential solution. Let us know if this works well for you.
I'm still seeing 9-10 threads created every 10 seconds when the application is idle. A thread is created, dies off almost immediately then another is created almost immediately. The threads' lifetime don't overlap.
When I increase the timeout I see a similar behaviour just the rate of creation/death is much reduced.
If I remove the timeout altogether a thread is sometimes created when there is activity but nothing during idle.
@leachbj - could you say a little more about what your application is doing? I tried to recreate the issue you are describing by running the Receive sample (both with no messages being sent to it, and with various low rates of messages being sent to it) but without success. Instead, I got a single thread that just sat in the pool forever (which is the behaviour I was expecting, as the client always has an outstanding schedule request to handle heartbeating connections).
@leachbj since 79dfe2c, the only time I've seen a turnaround of threads from the TimerService pool is if clients have gone into the retrying state - but I'm not sure why you'd see this in an idle app. Presumably you have logging statements in the onRetrying(...) method that would highlight if that was happening?
Are you seeing this behaviour in the modified apps that we recently discussed? i.e., the ones with singleton EndpointService, NettyNetworkService etc. shared across multiple clients? Or is this something you are just seeing with your existing apps and the latest client snapshots?
I've just re-tested with 1.0.2015060300 and the library built from commit 79dfe2cfeedf70f719731d13c6bf3d9eb6cbec80.
I'm testing with two applications. The server will accept on a topic wildcard and then respond to a unique topic. The client will subscribe to the response topics, generate a message to a unique topic, consume the response then shut down the async clients.
With the release version running the server, on application startup I'm seeing a thread created per second (pool-2), after processing some messages the per second behavior continues. The modified library version does the same.
The client application doesn't have any long lived async clients and so I just see the threads created during processing then they all die.
I've also tested a version of our application using shared CallbackService, NetworkService, TimerService. The CallbackService is just the SameThreadCallbackService since we just dispatch an akka message on callbacks, the NetworkService is the standard Netty implementation and the TimerService is a custom implementation using the akka scheduler. With this one my server code is not creating any threads (once the netty ones are created) but the client still does since netty keeps shutting down. I could probably replace netty with akka io but that would need to have SSL support which is a bit of a pain. I can get some specific logs from the app to try work out exactly why the timer threads are being created, just let me know what you're after, but if you made a release of the current version we could just use that with our own timer implementation.
Once connected to the broker the application creates a large number of very short lived threads. The issue is the TimerServiceImpl creates its pool as follows
But the idle time on the connection is 1500 seconds (default for MQLight broker I guess). This means that the core thread times out before the next job comes in and a new thread is created for every transport tick.