microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
9.9k stars 1.45k forks source link

`<execution>`: Should threadpool callback priority be influenced by parent thread priority? #4781

Closed StephanTLavavej closed 1 day ago

StephanTLavavej commented 2 days ago

Our C++17 parallel algorithms use the Windows threadpool:

https://github.com/microsoft/STL/blob/165fc9459c85faa65cd3dec313eb10f55bab154c/stl/inc/execution#L213 https://github.com/microsoft/STL/blob/165fc9459c85faa65cd3dec313eb10f55bab154c/stl/src/parallel_algorithms.cpp#L23-L26

We're passing null to CreateThreadpoolWork for the callback environment:

If this parameter is NULL, the callback executes in the default callback environment. For more information, see InitializeThreadpoolEnvironment.

Therefore, the threadpool callbacks have default priority, regardless of the parent thread priority.

Think-cell reported this to us, raising the concern of priority inversion.

In theory, we could alter our code to set up a callback environment, and use SetThreadpoolCallbackPriority to request low, normal, or high priority. However, it is unclear to me whether "The priority for the callback relative to other callbacks in the same thread pool." would control the priority relative to the parent thread, which is what would be needed to address priority inversion concerns.

Ultimately, the conclusion may be that thread priority is outside the realm of the Standard, so if an application has used the Windows API to alter its thread priorities, then the C++17 parallel algorithms are a poor fit for it.

AlexGuteniev commented 2 days ago

My opinion is wontfix.

Below is justification of this opinion.


SetThreadPoolCallbackPriority has too few choices comparing to usual thread priorities.

I also don't think that these priorities even map anyhow to usual thread priorities.

Thread pools are expected to handle gracefully short tasks, so that internal calls to SetThreadPriority performed by thread pool threads would be too much perf waste.

So I expect that thread pool doesn't call SetThreadPriority, and the priorities set via SetThreadPoolCallbackPriority is user mode thing that has no relationship in OS thread priority. And the originating thread priority is irrelevant.

As a hypothetically possible fix I only see that each thread would maintain its own thead pool with thread pool thread priorities matching the original thead priority. WinAPI thead pool doesn't allow setting its threads priority, at least not in a documented safe way, so would have to maintain own thread pool implementation. Also there's no way to track changes of master thread priority. In short, that's not seems feasible in practice.

So to me it looks like that an application that relies on non-default thread priorities has to avoid the use of thread pool in non-default prioritized threads. This includes avoidance of parallel algorithms, non-deferred std::async, #pragma omp, or /Qpar , whatever.

wanghan02 commented 1 day ago

In our tests, setting the thread priority in the callback to the thread priority of the parent thread, and restoring it on exit of the callback works, does not have a big overhead, and makes a significant difference in the performance of UI thread running simultaneously. Doing it once per range item is obviously inefficient, but once per callback is probably fine and potentially helps a lot.

AlexGuteniev commented 1 day ago

Doing it once per range item is obviously inefficient, but once per callback is probably fine and potentially helps a lot

This semi-defeats the purpose of parallel algorithms by adding heavy (though constant time) operation to each callback, and making scheduling callbacks less efficient. Though if you won't do it for normal priority parent thread, the normal case won't be penalized, so it may be fine.

(I wouldn't want even GetThreadPriority() of parent thread penalty, but that's not my call)

schoedl commented 1 day ago

Many interactive applications, which is arguably the majority of applications on Windows, will use thread priorities to make the UI thread more responsive. Actually, the majority of CPU-heavy work will likely be done in the background. Effectively excluding all of them from using the C++ standard library seems a bad idea. The difference is massive: for us the UI thread got 3x faster by lowering the background thread priorities in the for_each body.

AlexGuteniev commented 1 day ago

Many interactive applications, which is arguably the majority of applications on Windows, will use thread priorities to make the UI thread more responsive

This doesn't sound as anything common or widespread. I never encountered such a recommendation, or seen the code of an UI application that actually does this.

schoedl commented 1 day ago

Well, maybe they should because it makes the UI thread 3x faster. There is nothing special about what we are doing, besides both UI and background being CPU-bound. I have no reason to believe other applications would not get a similar result.

StephanTLavavej commented 1 day ago

We talked about this at the weekly maintainer meeting and we don't believe that any changes should be made here.

Having threadpool callbacks manually adjust the priority of threadpool worker threads (which the STL has not directly created, and could be reused by other components) would "contaminate" the thread pool, which seems undesirable. Even if the callback attempts to restore the threadpool worker thread's priority, because the STL isn't directly controlling how callbacks are executed in the threadpool, the chance for unwanted interactions seems high.

It sounds to us like C++17 parallel algorithms are a poor fit for your scenario (because of the Standard's lack of recognition of thread priority), and/or other designs should be investigated - e.g. a UI thread probably shouldn't start a parallel algorithm and block on its completion, instead that scenario sounds like it's suited for promises/futures, so that the UI thread can periodically check whether the long-running operation is complete, and display a placeholder/spinner/etc. while the operation is running.

AlexGuteniev commented 19 hours ago

There is nothing special about what we are doing, besides both UI and background being CPU-bound

Well, that sounds special. To make the UI responsive, you avoid making UI thread CPU-bound. From my experiences, this has always been feasible, even when showing quasi-relatime 60 fps chart. I acknowledge that this may be not feasible for your case, but this is definitely not a problem encountered by most of application, not something usual.