Closed jdherman closed 5 years ago
So with respect to point 2, what API do we want? Something like the following makes the most sense I guess.
with MPIExecutor(algorithm) as executor:
results = executor.run(nfe, logfrequency)
#or
with ProccessPoolExecutor(algorithm) as executor:
results = executor.run(nfe, logfrequency)
Looking at concurrent.futures, we could make wrappers around the ProcessPoolExecutor, and the MPIPoolExecutor from MPI4py.
Also, what Python version do you want to support? Concurrent.futures has changed between 3.6 and 3.7. The main change is that the ProcessPoolExecutor from 3.7 onwards accepts an initialiser function. If you want to support 3.6, we could use the Pool from Multiprocessing instead while having the initializer functionality. A counter to supporting the initialiser function is that the MPIPoolExecutor currently does not have it.
I have committed a minimum working example with a ProcessPoolExecutor. See my fork. Will work on it over the coming days some more.
Wow, thanks @quaquel. Much of this is over my head - I've never used futures or shared memory parallelization in general. Good opportunity to learn stuff!
Your updates look nice and are clearly separated from the algorithm code, which is my only concern. As long as the serial and MPI implementations don't become more complicated, I'm all for it.
For the 3.6/3.7 debate, do you mean that running with MPI will not be possible with 3.7? If so, I'd say we should hold off for now. But I may be misunderstanding the issue.
Thank you again, keep on hacking and let me know how it goes.
I have added a working MPI version as well. I have, however, a few questions
let me know what you think on these issues.
Cool! Let's see ...
I have been running from the command line as mpirun -n 32 python main.py
. For HPC jobs it needs to be runnable from the command line, but I have no problem if it's just python main.py
and MPI is somehow invoked in the script.
Yes the side effects are a new thing I added that I'm not sure about. I wanted to keep track of the number of occurrences for each action (so in the output, every action node will have a % of the time that it is triggered). This helps with pruning unused actions. I did make sure that it works in parallel and returns the modified object to the main process.
Sure, main.py (with the Folsom model example) probably belongs in the example folder. That would be cleaner.
I don't know what the different doc standards are. Numpy sounds fine :)
Sequential version - sounds good!
Yes, this might make more sense. One issue is that I never set up unit tests, so we'll have to make sure nothing breaks when the object structure changes.
Thank you again!! Post-AGU I should be able to put more time in to help with the modifications.
no worries, I have some time and this is relatively easy stuff
Ok, I have done the following:
One minor change in my own code is that I slightly changed the API:
with MPIExecutor() as executor:
snapshots = algorithm.run(1000, 10, executor)
So a couple of things are going on here. First, I want to have executors as context managers. This means that a pool of processes is nicely cleaned up after code execution. Second, I want to pass the executor to the run method of the algorithm rather than have a run method on the executor. The executor class is now essentially a small class with a map
method. Moreover, I can now more cleanly use the the attributes on the algorithm directly.
One remaining question: what should the return of run on the algorithm be?
Currently you only have a return in case of log_frequency not being None
, why not return the final population instead? And should these snapshots not be separate from logging? My guess is you want convergence information.
Wow, this is pretty slick. I'm impressed how concise the updates are. Thanks!! This is looking much more professional by the day.
Question about the return of algorithm.run ... yes, I didn't set this up very well, I've only run it with logging turned on. The snapshots include best_f, best_P, nfe, runtime saved at the frequency log_frequency, which helps to analyze convergence. The best option is probably what you described:
if log_frequency: print to CLI and save snapshots, return snapshots else: do not print or save snapshots; return the final population.
(When logging is turned off, it is very dumb to not return anything, as in the current setup...)
I went a slightly different route. In my view convergence information and logging while running are two different things and should thus be handled differently.
Logging For logging, I have replaced the print statements with the default python logging module. See the sequential example or multiprocessing example for how to use this. Python logging is very light weight so the performance hit is quite small.
I still have to run some tests under MPI and fine tune the multiprocessing example a bit. Ideally, it should be possible to see from which specific process (main or any subprocess) a log message originates. This makes debugging parallelisation code much easier. For the workbench I have this up and running so I have done it before.
Convergence For convergence, I have for now added a simply boolean. This makes sure that convergence information is saved at each generation. It might make more sense to also have convergence frequency attribute where you don't save convergence information every generation but only after a specific amount of nfe.
ok multiprocessing logging now works cleanly. This is basically ready to be merged.
getting logging to work with MPI is left for future work. It is a can of worms that I don't won't to touch right now.
Thanks Jan! This executor setup is really cool. A few questions / thoughts:
I'm running the MPI example now, and the logging seems to work fine. What did you think was wrong with it? It only prints log statements from the master process, but that's ok. I do need this to work (as well as the snapshots saving) because all of our cluster runs are MPI.
There should be an option to turn off the logging, and also to give a frequency for saving the convergence information, otherwise the snapshots dictionary will get too big.
The algorithm.run function should return something (final population / archive?) when snapshots are disabled.
After that we can do the merge, and I'd be happy to work on cleaning up documentation/readme stuff. Thanks again!
the logging in the main process is indeed working fine under MPI. I however often use debug log message from my subprocesses for understanding problems during parallelisation. I had a quick search and it seems that there are a few mpi4py solutions we could look at. Still, since logging in the main process is all you need, this is not urgent
turning of logging means not running logging.basicConfig, that moment the messages are logged but there is no handler to print them to console or send them to a file. Adding the convergence stuff is easy, one additional keyword argument. I will add this later today.
I agree, there are good arguments for either the final population or the archive. With final population, it becomes possible to reseed the algorithm and continue the optimisation. For most workflows, however, you want the archive rather than the final population. Irrespective of which of the two we return, the other should be accessible as an attribute. Third option is to always return both. You would get something like the following:
archive, population = algorithm.run(1000, convergence=False)
# with convergence
archive, population, convergence = algorithm.run(1000, convergence=True)
Ok great, just two things for now then - (1) the convergence frequency, and (2) the return statements. The way you outlined it here seems fine, other than the single-objective case would return one best solution instead of an archive.
Thanks! I'll keep an eye out for a PR. And happy holidays.
fixed, see documentation of run and examples for how it works
All resolved by #5 , thanks!
Some thoughts on improving algorithm performance.
Idea from @quaquel : Bring some of the tricks from e-NSGAII to policy trees to counteract stalled search: auto-adaptive population sizing and restarts.
concurrent.futures
api for executors. This will make it easier to switch between mpi4py on a cluster and multiprocessing on a single machine. (again from @quaquel )No immediate timeline for doing these things, just keeping track.