mreineck / ducc

Fork of https://gitlab.mpcdf.mpg.de/mtr/ducc to simplify external contributions
GNU General Public License v2.0
13 stars 12 forks source link

Document environment variables that affect parallelism #35

Closed landmanbester closed 2 months ago

landmanbester commented 2 months ago

I noticed a while ago that ducc started honouring the OMP_NUM_THREADS environment variable. Are there any others that users should be aware of? May be worth documenting this in the project README?

mreineck commented 2 months ago

It's semi-official ... and it wasn't well received by everyone (see https://gitlab.com/aroffringa/wsclean/-/issues/175).

In the longer term I hope to get rid of environment variables entirely to avoid all this potential confusion. This implies that an explicit thread pool size needs to be set by ducc0.misc.resize_thread_pool() whenever anything nonstandard is needed, but it might be a prize worth paying to avoid ambiguities.

landmanbester commented 2 months ago

DUCC0_NUM_THREADS is what I was looking for. Thanks!

mreineck commented 2 months ago

Ah sorry! I had thought you were aware of that one.

Still, for maximum future safety and minimum messing around with environment variables, I'd recommend doing

ducc0.misc.resize_thread_pool(<whatever number of threads you'd like>)

on program startup. You can also feed in the value of OMP_NUM_THREADS there - or whatever other environment variable that controls your multithreading environment. This is a kind of flexibility I cannot provide purely from within ducc0.

landmanbester commented 2 months ago

Just to clarify, is the expected behavior for ducc0 to always honour misc.thread_pool_size() within a process? I'm asking because I'm having difficulty parallelizing multiple gridding instances spun up from separate python threads. Seems to work fine when I spin them up from separate processes though. (This could be a bug on my side, hence the need to clarify)

mreineck commented 2 months ago

Just to clarify, is the expected behavior for ducc0 to always honour misc.thread_pool_size() within a process?

Within a process, yes. We need to be very careful with terminology here though ... to me a process is something that has a memory space distinct from every othe process on the machine, as opposed to threads, which always share a single memory space with other threads in that process.

I'm asking because I'm having difficulty parallelizing multiple gridding instances spun up from separate python threads.

I admit that I don't understand what this means. If you have several threads which all call ducc0.wgridder.<something>, then all these calls will share the (single) thread pool of that Python process. The Python process only has one instance of the ducc0 library loaded, and therefore there can only be a single thread pool, no matter if you call import ducc0 in multiple threads. So if you are in such a situation, you need to be careful to specify nthreads in all your ducc0 calls such that the sum of threads in all parallel calls is the size of the thread pool, so you don't overbook the hardware.

That said, this (calling simultaeously into ducc0 from multiple Python threads) is a scenario I haven't even tried yet. If it dosn't work as expected, we should look into it together.

mreineck commented 2 months ago

To clarify: if you are on a node with n cores, and you are running a Python process there with m threads, you should set the ducc thread pool size to n and call your gridding tasks with nthreads=n//m (if all threads are gridding in parallel).

If this doesn't work as expected (i.e. fill the node to 100%), I'd like to hear the exact symptoms.

landmanbester commented 2 months ago

This answers my question, thanks. I am in the scenario where I am calling simultaneously into ducc0 from multiple python threads (eg. gridding multiple imaging bands simultaneously) and I am seeing only thread_pool_size() threads spinning up. This is easily dealt with, just needed to clarify