ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.68k stars 2.53k forks source link

RunQueue for std.Thread.Pool may not use FILO strategy #21856

Open JiaJiaJiang opened 2 hours ago

JiaJiaJiang commented 2 hours ago

Zig Version

0.14.0-dev.2051+b1361f237

Steps to Reproduce and Observed Behavior

In std thread pool, all "spawn" methods will add new tasks to the beginning of the queue:

https://github.com/ziglang/zig/blob/c39ba682e359c14b45495abe337c29645e8e9669/lib/std/Thread/Pool.zig#L142

https://github.com/ziglang/zig/blob/c39ba682e359c14b45495abe337c29645e8e9669/lib/std/Thread/Pool.zig#L206

https://github.com/ziglang/zig/blob/c39ba682e359c14b45495abe337c29645e8e9669/lib/std/Thread/Pool.zig#L250

However, the worker will get the task from the beginning of the queue too:

https://github.com/ziglang/zig/blob/c39ba682e359c14b45495abe337c29645e8e9669/lib/std/Thread/Pool.zig#L286

https://github.com/ziglang/zig/blob/c39ba682e359c14b45495abe337c29645e8e9669/lib/std/Thread/Pool.zig#L308

Therefore, the thread pool queue uses the first-in-last-out strategy (FILO), which means that if tasks are added frequently, the first set task may wait for a long time before being executed, or may even never be executed. I think this strategy is not compatible with the task queue.

Expected Behavior

It should use the first-in-first-out (FIFO) strategy to ensure that the first set task is executed first, which is not only in line with program logic, but also more intuitive.

kprotty commented 2 hours ago

std.Thread.Pool has no ordering guarantees on task processing and LIFO (FILO) is more efficient for CPU-heavy workloads as it prefers working with data already in cache. Why should it use FIFO / ensure task fairness?

JiaJiaJiang commented 2 hours ago

@kprotty Thank you for your response. For example, if you use a thread pool to process some real-time data, when the processing is slow for some reason, new tasks are constantly added to the processing queue, while the first added tasks can never be completed, because the pool is always processing newly added task, causing the data processing flow to be stuck. If the FIFO strategy is used, even if the processing is slow, at least the data can continue to flow. If the developer is not aware that the queue is FILO, then the above situation may happen frequently. Perhaps FIFO or FILO can be an option for the pool? It can let developers decide which one to use based on the usage.

JiaJiaJiang commented 2 hours ago

By the way, I don't mean to guarantee the order of tasks, but the FILO strategy may cause the first added task to be executed after a long time, which may have an adverse effect on the threads waiting for the task in serious cases.

kprotty commented 2 hours ago

No ordering guarantees allows the Pool to use worker-local list with different pop orders in the future (which is what's state of the art, but currently not implemented). Latency tuned/sensitive thread pools (like those in Rust Tokio and Golang) exist, but real-time guarantees seem orthogonal to a thread pool by design. And no zig stdlib data-structure is aware of real-time scheduling either (would need to also change the impl of Mutex, Condition, etc.)

A worker thread in the pool should only block on a task using waitAndWork, which runs other tasks (if any) instead of blocking when possible. Otherwise there is a risk of deadlock irrespective of internal task ordering.

JiaJiaJiang commented 1 hour ago

@kprotty Or let me change an example. If I write a web service and use a thread pool to handle each request, this means that the person who makes the request first will have to wait until the last request to be responded to. If the load is very high, the user may have to wait for a long time before the server responds. Is this use case suitable?

kprotty commented 53 minutes ago

Ideally you'd use single-threaded non-blocking IO for a web server, but if threads must be used then a latency-sensitive pool is needed to avoid request starvation. Should zig stdlib provide such a pool? Not sure.

Thread Pools are primarily for CPU bound work - it's used here in a hacky context. To really avoid accidental starvation, things like eventual-fairness need to be baked into thread synchronization paths which may block like Mutexes, Channels, etc. (as seen in Rust/Go).

I think that's better left to a separate library/runtime.

JiaJiaJiang commented 29 minutes ago

Ideally you'd use single-threaded non-blocking IO for a web server, but if threads must be used then a latency-sensitive pool is needed to avoid request starvation. Should zig stdlib provide such a pool? Not sure.

@kprotty Yes, this solution may be better. I just give a simple example to assume that if a developer uses this thread pool to complete such work, he may be troubled by some unexpected behavior, because commonly people will think that the task added earlier will be executed earlier. If this thread pool needs to maintain this feature, I think it may need to be emphasized in the document. After all, when there is a Pool in std, more people will prefer to use it.