preshing / cpp11-on-multicore

Various synchronization primitives for multithreaded applications in C++11.
zlib License
517 stars 102 forks source link

Avoid spinning if single-core #6

Closed RandomShaper closed 2 years ago

RandomShaper commented 9 years ago

Conservatively rely on std::thread::hardware_concurrency to avoid spinning if hardware multithreading is not available. Although the repository name seems to restrict it to multicore environments, making it a bit more universal cannot harm.

(Not completely sure I have implemented this correctly.)

RandomShaper commented 8 years ago

Oh! That's true. I fell in an implicit conversion trap. Well, at least the runtime behavior would be the same as formerly. :) Fixing it in a breeze.

By the way, is my logic correct?

dpantele commented 8 years ago

I believe so, although calling a function every time seems a bit overkill for me. But maybe link-time optimization will eliminate this call.

RandomShaper commented 8 years ago

You're right. I've checked Clang std lib implementation and it's not lightweight.

dpantele commented 8 years ago

And overall I am not sure this patch is useful, sorry, but luckily that's not my decision. In my opinion, such tweaks must happen in compile-time.

RandomShaper commented 8 years ago

Regarding your compile time point of view, I'm not sure. For instance, consider an app built for ARM-based phones. How can you know beforehand how many cores will it be run on?

hlide commented 8 years ago

I assume this function is returning the number of logical processors the hardware has, not in use.

dpantele commented 8 years ago

Do Android apps really benefit from using multicore?

RandomShaper commented 8 years ago

@hlide, that's right. But on one hand, I don't believe std C++ provides any means to query the current thread scheduling strategy and on the other hand, if you mean how many CPUs are being actually used for all the process' threads so that the waiting thread could end up spinning on the same CPU as the one waiting, that would be as performant as the original code at worst. But probably we can trust the OS scheduling to put them to work in real parallel. Do you know of any platform on which a process could be constrained to a subset of the logical CPUs by decision of the OS (so excluding affinity tweaks, for instance)?

@dpantele, while it's true that many users pick mid to high end multicore devices to run WhatsApp or do some casual browsing, think of modern games, especially 3D. Engine designers try to squeeze as much computing power as possible to get a fluid framerate while handling rendering, physics, AI…

RandomShaper commented 8 years ago

By the way, it would be nice that @preshing came round here to give his point of view, but it looks a long time ago since he worked with this repo.

hlide commented 8 years ago

Spinning wait is not an exact science. Normally, if there are more than one logical processor ( = cores X hardware threads per core), it is acceptable because it is very short and spinning means another running logical processing is messing with the same semaphore so one must wait for the other to complete instead of sleeping as the other logical processor will complete in a very short time. If there is no other logical processor trying to access the same semaphore, there is no spinning and the only one logical processor immediately accesses the semaphore. Keeping spinning for a mono-core without hardware threads is not really an issue since that case is in extinction.

RandomShaper commented 8 years ago

If I get your point, you're saying that my logic may be right but that my attempt at optimization if of little use because of the current status of technology. Am I right? (I'm just an apprentice regarding parallel programming. Just striving to learn and contribute so I find this discussion very constructive.)

hlide commented 8 years ago

I think you can drop this optimization because mono-core is less and less the norm nowadays.

RandomShaper commented 7 years ago

@preshing, I've squashed all the commits together so this PR is concise.