nuclear-multimessenger-astronomy / nmma

A pythonic library for probing nuclear physics and cosmology with multimessenger analysis
https://nuclear-multimessenger-astronomy.github.io/nmma/
GNU General Public License v3.0
30 stars 58 forks source link

threadpoolexecutor for distributing MPI #230

Open sahiljhawar opened 1 year ago

sahiljhawar commented 1 year ago

This PR more closely follows the idea given in #215, repeatedely calling light_curve_analysis within the script. This parallelisation is free from all the issues faced in #229. Multiple methods are implemented to leverage the parallelisation based on user preference. List of commands with explanation:

Things to note:

injection.log now works as it is expected to; logs from concurrent runs do not leak.

tylerbarna commented 1 year ago

if one runs multi_model_analysis --config config.yaml --parallel --process 20 for a config file that has like 30 different fits, will it start with 20 fits and then queue up the remaining 10 to start once the others have finished?

sahiljhawar commented 1 year ago

@tylerbarna That's a good question and I didn't had this case in my mind. When --parallel is set, in this case 20//30 will be 0, hence the configs will run with mpiexec -np 0 ... Even though it's 0, the process starts with 1 process. Reading online and on ChatGPT it seems the execution will depend on the implementation of MPI. In my case -np 0 does not fails and start with 1 process.

tylerbarna commented 1 year ago

@tylerbarna That's a good question and I didn't had this case in my mind. When --parallel is set, in this case 20//30 will be 0, hence the configs will run with mpiexec -np 0 ... Even though it's 0, the process starts with 1 process. Reading online and on ChatGPT it seems the execution will depend on the implementation of MPI. In my case -np 0 does not fails and start with 1 process.

looking through some stackoverflow pages, concurrent.futures might already handle this (see here), but I've seen some conflicting info. I've also seen some things along the following lines that explicitly use concurrent.futures.Queue():

import concurrent.futures

# Create a thread pool executor with 4 threads
executor = concurrent.futures.ThreadPoolExecutor(max_workers=4)

# Create a queue
queue = concurrent.futures.Queue()

# Add tasks to the queue
queue.put(task1)
queue.put(task2)
queue.put(task3)

# Start the executor
executor.map(task_function, queue)

# Wait for the executor to finish
executor.shutdown()

but I haven't experimented much with it.

There's also multiprocessing,queue

sahiljhawar commented 1 year ago

@tylerbarna I have indeed used the concurrent.futures :') But what is the conflicting info you have seen?

tylerbarna commented 1 year ago

y

@tylerbarna I have indeed used the concurrent.futures :') But what is the conflicting info you have seen?

Conflicting info regarding whether or not submitting more jobs than the max number of workers handles the queueing automatically

sahiljhawar commented 1 year ago

@tylerbarna Okay. We can try to implement along these lines. Maybe, defaulting to 1 process per configuration if no. of config > no. of process mentioned. Or making it strict policy for the no of processes to be an integer multiple of the no of configurations, if analysis needs to be in parallel. Also try to implement a queue based execution, as what you have asked here https://github.com/nuclear-multimessenger-astronomy/nmma/pull/230#issuecomment-1714565881

@mcoughlin @tsunhopang Any comments regarding these?

tylerbarna commented 1 year ago

@tylerbarna Okay. We can try to implement along these lines. Maybe, defaulting to 1 process per configuration if no. of config > no. of process mentioned. Or making it strict policy for the no of processes to be an integer multiple of the no of configurations, if analysis needs to be in parallel.

yeah, the former option seems like it would be best. There can be situations where we might have a larger number of jobs that we want to fit than number of cores we can allocate to one job in a cluster, but being able to still parallelize using the cores we have is good

mcoughlin commented 1 year ago

@sahiljhawar I have no particularly strong preference, whatever works for folks.

sahiljhawar commented 11 months ago
sahiljhawar commented 7 months ago

@tsunhopang

tsunhopang commented 7 months ago

could u add a small test for this executable? E.g. Run on At2017gfo with two models

sahiljhawar commented 7 months ago

@tsunhopang yeah, I am trying to do that but there seems to be some issues with config file Path.