Ideas for improvement - Winter 2019

jdherman commented 5 years ago

Some thoughts on improving algorithm performance.

Currently there is some evidence of search stalling. My approach is usually to run more random seeds, a larger population size, larger number of parent solutions (lower search pressure), or a lower crossover probability (because 1-cx_prob will replace with randomly generated trees).

Idea from @quaquel : Bring some of the tricks from e-NSGAII to policy trees to counteract stalled search: auto-adaptive population sizing and restarts.

Current parallel implementation only uses MPI but not multiprocessing. Could be an option to switch to concurrent.futures api for executors. This will make it easier to switch between mpi4py on a cluster and multiprocessing on a single machine. (again from @quaquel )

No immediate timeline for doing these things, just keeping track.

quaquel commented 5 years ago

So with respect to point 2, what API do we want? Something like the following makes the most sense I guess.


with MPIExecutor(algorithm) as executor:
    results = executor.run(nfe, logfrequency)

#or
with ProccessPoolExecutor(algorithm) as executor:
    results = executor.run(nfe, logfrequency)

Looking at concurrent.futures, we could make wrappers around the ProcessPoolExecutor, and the MPIPoolExecutor from MPI4py.

Also, what Python version do you want to support? Concurrent.futures has changed between 3.6 and 3.7. The main change is that the ProcessPoolExecutor from 3.7 onwards accepts an initialiser function. If you want to support 3.6, we could use the Pool from Multiprocessing instead while having the initializer functionality. A counter to supporting the initialiser function is that the MPIPoolExecutor currently does not have it.

quaquel commented 5 years ago

I have committed a minimum working example with a ProcessPoolExecutor. See my fork. Will work on it over the coming days some more.

jdherman commented 5 years ago

Wow, thanks @quaquel. Much of this is over my head - I've never used futures or shared memory parallelization in general. Good opportunity to learn stuff!

Your updates look nice and are clearly separated from the algorithm code, which is my only concern. As long as the serial and MPI implementations don't become more complicated, I'm all for it.

For the 3.6/3.7 debate, do you mean that running with MPI will not be possible with 3.7? If so, I'd say we should hold off for now. But I may be misunderstanding the issue.

Thank you again, keep on hacking and let me know how it goes.

quaquel commented 5 years ago

I have added a working MPI version as well. I have, however, a few questions

do you want to run mpi from the command line (mpiexec ...)?
the current code has side effects: running the model changes the state of the candidate solution (i.e. the tree). This is probably unavoidable, but can be a source of confusion. In particular, when running in parallel, the candidate solution is changed in the parallel process, rather than in the main process. If the modified candidate solution is not returned from parallel process to the main process, you have a problem.
Should I move main.py to a separate examples folder?
which doc standard do you want to use (I prefer the numpy doc standard)
I will add a sequential version of the code later this week.
Cleaning of the code might require a few changes to the algorithm (i.e. population and objectives should not be attributes of the algorithm but rather parameters that are used when calling e.g. iterate).

let me know what you think on these issues.

jdherman commented 5 years ago

Cool! Let's see ...

I have been running from the command line as mpirun -n 32 python main.py. For HPC jobs it needs to be runnable from the command line, but I have no problem if it's just python main.py and MPI is somehow invoked in the script.
Yes the side effects are a new thing I added that I'm not sure about. I wanted to keep track of the number of occurrences for each action (so in the output, every action node will have a % of the time that it is triggered). This helps with pruning unused actions. I did make sure that it works in parallel and returns the modified object to the main process.
Sure, main.py (with the Folsom model example) probably belongs in the example folder. That would be cleaner.
I don't know what the different doc standards are. Numpy sounds fine :)
Sequential version - sounds good!
Yes, this might make more sense. One issue is that I never set up unit tests, so we'll have to make sure nothing breaks when the object structure changes.

Thank you again!! Post-AGU I should be able to put more time in to help with the modifications.

quaquel commented 5 years ago

Excellent, my evaluator code works in the same way so nothing changes
Topic for a later discussion, my guess is that it will be almost unavoidable and thus should be properly documented
Ok, I will reorganize
see https://stackoverflow.com/questions/3898572/what-is-the-standard-python-docstring-format if you want to learn more. SALib also uses numpy from the looks of it.
Will do, is 5 minutes of work
We should add a seperate issue on getting travis set up.

no worries, I have some time and this is relatively easy stuff

quaquel commented 5 years ago

Ok, I have done the following:

reorganised the code to an example folder. This folder now contains an MPI, Multiprocessing, and sequential example. They produce identical results
The sequential executor has been added
I made a start with documentation, but in various places your input (after AGU) would be quite welcome

One minor change in my own code is that I slightly changed the API:


    with MPIExecutor() as executor:
        snapshots = algorithm.run(1000, 10, executor)

So a couple of things are going on here. First, I want to have executors as context managers. This means that a pool of processes is nicely cleaned up after code execution. Second, I want to pass the executor to the run method of the algorithm rather than have a run method on the executor. The executor class is now essentially a small class with a map method. Moreover, I can now more cleanly use the the attributes on the algorithm directly.

One remaining question: what should the return of run on the algorithm be? Currently you only have a return in case of log_frequency not being None, why not return the final population instead? And should these snapshots not be separate from logging? My guess is you want convergence information.

jdherman commented 5 years ago

Wow, this is pretty slick. I'm impressed how concise the updates are. Thanks!! This is looking much more professional by the day.

Question about the return of algorithm.run ... yes, I didn't set this up very well, I've only run it with logging turned on. The snapshots include best_f, best_P, nfe, runtime saved at the frequency log_frequency, which helps to analyze convergence. The best option is probably what you described:

if log_frequency: print to CLI and save snapshots, return snapshots else: do not print or save snapshots; return the final population.

(When logging is turned off, it is very dumb to not return anything, as in the current setup...)

quaquel commented 5 years ago

I went a slightly different route. In my view convergence information and logging while running are two different things and should thus be handled differently.

Logging For logging, I have replaced the print statements with the default python logging module. See the sequential example or multiprocessing example for how to use this. Python logging is very light weight so the performance hit is quite small.

I still have to run some tests under MPI and fine tune the multiprocessing example a bit. Ideally, it should be possible to see from which specific process (main or any subprocess) a log message originates. This makes debugging parallelisation code much easier. For the workbench I have this up and running so I have done it before.

Convergence For convergence, I have for now added a simply boolean. This makes sure that convergence information is saved at each generation. It might make more sense to also have convergence frequency attribute where you don't save convergence information every generation but only after a specific amount of nfe.

quaquel commented 5 years ago

ok multiprocessing logging now works cleanly. This is basically ready to be merged.

getting logging to work with MPI is left for future work. It is a can of worms that I don't won't to touch right now.

jdherman commented 5 years ago

Thanks Jan! This executor setup is really cool. A few questions / thoughts:

I'm running the MPI example now, and the logging seems to work fine. What did you think was wrong with it? It only prints log statements from the master process, but that's ok. I do need this to work (as well as the snapshots saving) because all of our cluster runs are MPI.
There should be an option to turn off the logging, and also to give a frequency for saving the convergence information, otherwise the snapshots dictionary will get too big.
The algorithm.run function should return something (final population / archive?) when snapshots are disabled.

After that we can do the merge, and I'd be happy to work on cleaning up documentation/readme stuff. Thanks again!

quaquel commented 5 years ago

the logging in the main process is indeed working fine under MPI. I however often use debug log message from my subprocesses for understanding problems during parallelisation. I had a quick search and it seems that there are a few mpi4py solutions we could look at. Still, since logging in the main process is all you need, this is not urgent
turning of logging means not running logging.basicConfig, that moment the messages are logged but there is no handler to print them to console or send them to a file. Adding the convergence stuff is easy, one additional keyword argument. I will add this later today.
I agree, there are good arguments for either the final population or the archive. With final population, it becomes possible to reseed the algorithm and continue the optimisation. For most workflows, however, you want the archive rather than the final population. Irrespective of which of the two we return, the other should be accessible as an attribute. Third option is to always return both. You would get something like the following:

    archive, population = algorithm.run(1000, convergence=False)

    # with convergence
    archive, population, convergence = algorithm.run(1000, convergence=True)

jdherman commented 5 years ago

Ok great, just two things for now then - (1) the convergence frequency, and (2) the return statements. The way you outlined it here seems fine, other than the single-objective case would return one best solution instead of an archive.

Thanks! I'll keep an eye out for a PR. And happy holidays.

quaquel commented 5 years ago

fixed, see documentation of run and examples for how it works

jdherman commented 5 years ago

All resolved by #5 , thanks!

jdherman / ptreeopt

Ideas for improvement - Winter 2019 #4