mreineck / ducc

Fork of https://gitlab.mpcdf.mpg.de/mtr/ducc to simplify external contributions
GNU General Public License v2.0
13 stars 12 forks source link

Support external thread pools #9

Closed mreineck closed 1 year ago

mreineck commented 1 year ago

This is a (perhaps way too naive) attempt to make ducc0's multithreading work with external thread pool implementations. The idea is that potential users wrap their thread pools into an (extremely minimalistic) interface and push the desired thread pool onto a ducc0-internal stack before calling parallel ducc0 code. When everything is done, the thread pool should be popped again. Since the stack itself is thread-local, it should even be possible to have several thread pools active at the same time, if so desired.

Since multithreaded programming is not something I have a lot of experience with, this proposal probably has a lot of flaws, and it could very well be completely impossible for reasons I'm not seeing at the moment.

@cantonios @peterbell10 I would really value your thoughts on this. Please don't pull your punches ;-)

mreineck commented 1 year ago

Sorry, there are more changes in the PR than I intended to. You only need to look at threading.h and threading.cc.

mreineck commented 1 year ago

Cleaner diff at https://gitlab.mpcdf.mpg.de/mtr/ducc/-/merge_requests/178. I don't understand why Github and Gitlab behave so differently in this case...

mreineck commented 1 year ago

Though this path is currently only enabled if DUCC0_NO_THREADING is undefined. TF needs to define that to avoid including or using <mutex>/<thread>.

I plan to adjust the DUCC0_NO_THREADING part accordingly. Still I guess we need to find a better macro to avoid the headers that TF doesn't like ... anything called NO_THREADING is just too confusing for this purpose :-) But I'm sure we can fix that.

The max number of threads (and hence total threads returned by adjust_nthreads) is also governed by DUCC0 settings, rather than being handled by the threadpool. Ideally the threadpool would decide how to split up the work, and the maximum number of tasks used.

Yes, absolutely. I have added the nthreads method for that purpose, but I have not switched to using it yet. Will get to that soon, I hope.

mreineck commented 1 year ago

@cantonios since we cannot use std::thread and std::mutex (and I assume std::lock_guard and std::condition_variable as well) in TF, does TF provide classes with compatible interface that can be used instead?

If it's just a matter of

#ifdef TENSORFLOW
#include "tf/multithreading_stuff.h"
#else
#include <thread>
#endif

...

#ifdef TENSORFLOW
using mythread = tf::thread;
#else
using mythread = std::thread;
#endif

and then using mythread afterwards, I'm happy to do that.

cantonios commented 1 year ago

since we cannot use std::thread and std::mutex (and I assume std::lock_guard and std::condition_variable as well) in TF, does TF provide classes with compatible interface that can be used instead?

Sort-of. We use absl::Mutex, with its own set of locks. For threads, internally we use special Google thread equivalents. For open-source TF we use pthread_create explicitly when available, otherwise we use std::thread.

I think the main issue though is that we don't want a new global threadpool to be created for this, so setting aliases like that won't really help us. We want to use TF's existing thread pool(s).

mreineck commented 1 year ago

That's understood, and I'm sure I can come up easily with a way to #ifdef out the ducc_thread_pool completely and set active_pool to nullptr on startup. Then you can use exclusively your own thread pools.

Still, there are mutexes, locks etc. in the Distribution class, and I would like to keep this class as independent as possible from the underlying thread pool. Also I'm using mutexes and locks in other ducc0 components like the NUFFT. It's fine to change all those to, say, ducc0::implementation_dependent_mutex etc., but I can't really give them up entirely.

mreineck commented 1 year ago

I think that the current version should make it straightforward to plug in a different set of low-level threading classes (via the DUCC0_LOWLEVEL_THREADING macros) and custom thread pools. Please let me know if you encounter any obstacles!

mreineck commented 1 year ago

OK, I hope I got it right now.

cantonios commented 1 year ago

Thanks for putting in the effort here!! I'll need to to try it out properly to see if I can get something like this to work for both Eigen and TF. That will take me some time though - as this isn't that high a priority on our end. I should probably get everything working single-threaded first anyways.

mreineck commented 1 year ago

Great! No need to rush with testing, but if you agree that we are going in the right direction with this, I'd like to merge the branch in the near future. We can work on the details in a separate PR.

mreineck commented 1 year ago

I propose to merge this early next week unless there are objections.

mreineck commented 1 year ago

Merging now. Thanks again for all the helpful discussion!