[BUG] Thread-pool will add threads until system resources are exhausted

mdbergmann commented 2 weeks ago

Seems like the thread-pool implementation is not capping the number of maximum threads. So submitting work in a loop, large loop, to be processed concurrently will basically kill the application because no more threads can be created.

adlai commented 2 weeks ago

Correct, that behavior was intended when ChanL was originally written; the logic was that the same will happen if you allocate memory without bounds, and exhaust the heap.

The only limit in the current thread pool is a "soft limit", and that sets the number of worker threads kept alive waiting for new tasks after existing ones have finished.

How do you think this should be addressed? and I'm curious to hear whether you encountered this in any production system, or simply by playing around...

mdbergmann commented 2 weeks ago

Well, I'm the author of Sento and was looking at thread-pool implementations that can replace the 'made-up' dispatchers/thread-pools in Sento that are basically just actors each operating a separate thread, which effectively in Sento is used as something like a thread-pool. The performance tests I usually run do something in a loop of ~1M repeats. Insofar it's just playing around. However I work with production actor based systems (in other runtimes) that are massively asynchronous. Hundreds of thousands of concurrent IO inputs/outputs is nothing unusual.

It can simply be addressed by capping the maximum number of threads. The question is what should happen when the thread-pool has no available threads and the queue is also full. The easiest strategy then is that the caller runs the operation, that kind of is a natural back-pressure mechanism. But there are other strategies possible.

mdbergmann commented 2 weeks ago

One could also use something like: https://github.com/Frechmatz/cl-threadpool

adlai commented 2 weeks ago

Thank you for the links and ideas.

You probably noticed that ChanL is mostly constructed so different thread pool implementations can be used; one good improvement could be adding to the pcall function (and through it, to the pexec macro) an ability to swap out the thread pool; meanwhile, there would be some CLOS API of how thread pools operate, along with a few default ones, and users with special constraints or extreme demands could implement custom pools.

The current CLOS footprint is two classes thread-pool and task, and one generic, assign-task. I'm pondering whether any change to this is needed for implementing the vision of the previous paragraph, and open to suggestions before I write any code.

By the way, the use case that I've been wanting to support is as follows: I run several liquidity providers, each with about a dozen tasks; I need the ability to nuke and restart individual liquidity providers, so currently I get this by separating them to different lisp images. If each had its own thread pool, I could run all within one lisp image and conserve resources. From this perspective, it is fortunate that I'm small-time, and only run a few of these, although I'd like to keep a low overhead for scaling so I can take on additional clients (most people who want to profit from liquidity providing don't have or want the technical ability to deal with some tangled old lisp code).

adlai commented 1 week ago

I'm considering whether the test suite should deliberately fail on lighter systems, by triggering this; and if so, how should the human be warned before the expected crash, in case someone who does not read documentation simply fires up slime, quicklisp, and ,test-system.

adlai commented 3 days ago

Enhancement begun!

Commit 180c290 prefixes soft- to chanl:thread-pool; eventually, the default thread pool will use configurable hard limits, and the soft pool will only be retained in case someone needs it for flexibility or compatibility.

zkat / chanl

[BUG] Thread-pool will add threads until system resources are exhausted #26