paulknysh / blackbox

A Python module for parallel optimization of expensive black-box functions
MIT License
437 stars 60 forks source link

Add a new parameter 'executor' for custom parallelisation engine #5

Closed celliern closed 7 years ago

celliern commented 7 years ago

New feature : hability to provide a custom executor (an object that behave like the mp.Pool, with a context manager behaviour and a map method.)

example using dask.distributed tool :

In [1]: import blackbox as bb
   ...: import numpy as np
   ...: from distributed import Client
   ...: 
   ...: c = Client("localhost:8786")
   ...: 
   ...: def fun(par):
   ...:     a, b = par
   ...:     return np.sqrt((a - 5) ** 2 + (b - 3) ** 2)
   ...: 
   ...: 
   ...: def main():
   ...:     bb.search(f=fun,  # given function
   ...:               box=[[-10., 10.], [-10., 10.]],  # range of values for each parameter
   ...:               n=4,  # number of function calls on initial stage (global search)
   ...:               m=4,  # number of function calls on subsequent stage (local search)
   ...:               batch=8,  # number of calls that will be evaluated in parallel
   ...:               resfile='output.csv',
   ...:               executor=c.get_executor)  # text file where results will be saved
   ...: 
   ...: 
   ...: if __name__ == '__main__':
   ...:     main()

That snippet launch the computation on the different dask workers whose can be on different computers on the network.

celliern commented 7 years ago

Hello!

The join / close is contained in the context manager usage of the Pool (or other executor). Using

with mp.Pool() as p:
    p.map(function, iterable)

is roughly equivalent to

try:
    p = mp.Pool()
    p.map(function, iterable)
finally:
    p.join()
    p.close()

It is even better than your solution with pmap, because it ensure that the pool is closed even if an exception occur between the creation and the closing of the pool. It is the same as using with open(filename) as f instead of opening and explicitely closing the file.

See that blog post for better understanding of context managers.

For the notion of executor, it's because a lot of lib try to have the same API as the concurrent.futures builtin library. In that case, an executor is defined as an object with a start and a close method, which can be used as a context manager and give access to the map method (among others). It will have been more explicite if I had set

executor=concurrent.futures.ProcessPoolExecutor

as default maybe?

Oh, I think about it : the Pool as context manager is a py3 stuff. But I have an implementation working with py2 (it's a very simple context manager...) if you want to keep the py2 compatibility.

Regards,

Nicolas Cellier

celliern commented 7 years ago

Okey, I have ensured the compatibility py2 / py3, and I have add a tox config with some basic test in order to be sure.

I have also made the resfile non-mendatory, as it can be useful to just return the parameters.

(Do not worry about the closing/reopening, little manipulation issue)

paulknysh commented 7 years ago

As I was saying earlier, I like your idea about specifying external executor. Also, I like your proposition about saving to a text file being an optional thing. Those are great points and thanks for pointing them out.

However:

1) Added lines 6-20 in blackbox.py look kind of ugly. I guess they provide py2 compatibility for the line "with executor() as e:" (btw, are you sure this issue exists at all?). Can we find more elegant solution, that involves less code and less additional libraries? I would say that what I proposed (to keep pmap() and to use it as default executor) looks better/simpler. But I think you have more knowledge on this so maybe we can find something even better. I feel like there should be a "one line of code" solution.

2) This is very simple code and I don't think there is a need to add all those extra files (which I'm sure are critical for a large project). The test file looks just like an example that I have in README, so there is no value in duplicating it. Requirements (numpy, scipy) are very standard computational libraries, I think no need to even mention them. gitignore and tox files don't seem important as well.

celliern commented 7 years ago

I will think about make that more simple. But yes, mp.Pool can not be used as a context manager. But I may try something more elegant. (if we drop py2 support, we can avoid that... But I feel we need to support python 2 as this is a small utility script).

We can remove the extra files, I used them in order to ensure the compatibility across the version. But I think that this tool is really useful and should be available in pypi. (you can see path.py for an other simple one file lib which is tested and pipable). But this is clearly up to you :) if pipable, it need more maintenance effort but the user can using it with a simple pip install blackbox, which is nice.

Thanks for the different comments. Tell me if you want that I delete extra file. I will try shorter solution for the executor. (BTW, all extra imports are built in! It do not add dependencies to blackbox :))

⁣Envoyé par TypeApp ​

Le 28 août 2017 14:26, à 14:26, Paul Knysh notifications@github.com a écrit:

As I was saying earlier, I like your idea about specifying external executor. Also, I like your proposition about saving to a text file being an optional thing. Those are great points and thanks for pointing them out.

However:

1) Added lines 6-20 in blackbox.py look kind of ugly. I guess they provide py2 compatibility for the line "with executor() as e:" (btw, are you sure this issue exists at all?). Can we find more elegant solution, that involves less code and less additional libraries? I would say that what I proposed (to keep pmap() and to use it as default executor) looks better/simpler. But I think you have more knowledge on this so maybe we can find something even better. I feel like there should be a "one line of code" solution.

2) This is very simple code and I don't think there is a need to add all those extra files (which I'm sure are critical for a large project). The test file looks just like an example that I have in README, so there is no value in duplicating it. Requirements (numpy, scipy) are very standard computational libraries, I think no need to even mention them. gitignore and tox files don't seem important as well.

-- You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/paulknysh/blackbox/pull/5#issuecomment-325339465

celliern commented 7 years ago

Hey!

I have removed the extra files (after checking compatibility) and I have delegated the "ugly" lines inside a function, making these lines more explicit in a way?

Tell me if it suits you.

Nicolas

paulknysh commented 7 years ago

Yes, putting it all in a separate function sounds like a better solution for now, thanks! In the future I hope we can find a better one.

Did you test the code on both py2 and py3?

celliern commented 7 years ago

yeah! I kept the auxiliary file in local, so I manage to run the same toy test optimization with only scipy and numpy for py2 and py3

2017-08-28 19:29 GMT+02:00 Paul Knysh notifications@github.com:

Yes, putting it all in a separate function sounds like a better solution for now, thanks! In the future I hope we can find a better one.

Did you test the code on both py2 and py3?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/paulknysh/blackbox/pull/5#issuecomment-325420185, or mute the thread https://github.com/notifications/unsubscribe-auth/ACjzFJNPUl7OChqqnMesnxTC5ZCwVZ2mks5scvjxgaJpZM4PDqEZ .

paulknysh commented 7 years ago

Cool! I'm going to double check everything and merge it soon.

Also, please add a description for a get_default_executor() function (overview + what it returns)

paulknysh commented 7 years ago

Lines 110-113 have wrong indentation, they should be aligned with the following if and return statements. Please fix.

Also, do we need lines 43, 44? Can we just set default value of executor "executor=get_default_executor" (line 9)?

celliern commented 7 years ago

Done and done :)

paulknysh commented 7 years ago

OK, I'm going to polish it further, but I'm merging it for now. Thanks again for your contribution! Feel free to close an issue that you opened some time ago (or maybe only I can do that?).

paulknysh commented 7 years ago

People say there is an issue with running current code on Windows. There is an error like that:

File "C:\ProgramData\Anaconda2\lib\multiprocessing\forking.py", line 258, in init RuntimeError: Attempt to start a new process before the current process has finished its bootstrapping phase. This probably means that you are on Windows and you have forgotten to use the proper idiom in the main module: if name == 'main': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce a Windows executable. self._repopulate_pool()

Can you please take a look? The previous code (without executor) worked OK, so probably there is an issue in get_default_executor() function etc. I'll need to go back to previous version in case we won't find a problem.

Btw, have you tested your code on Win at all? I know it works good in Linux.

celliern commented 7 years ago

I will see it tomorrow. Proposition : rolling back to ensure full compatibility and making a new branch, merging only when tested on windows.

⁣Envoyé par TypeApp ​

Le 3 sept. 2017 20:26, à 20:26, Paul Knysh notifications@github.com a écrit:

People say there is an issue with running current code on Windows. There is an error like that:


File "C:\ProgramData\Anaconda2\lib\multiprocessing\forking.py", line 258, in init RuntimeError: Attempt to start a new process before the current process has finished its bootstrapping phase.

       This probably means that you are on Windows and you have
       forgotten to use the proper idiom in the main module:

           if __name__ == '__main__':
               freeze_support()
               ...

       The "freeze_support()" line can be omitted if the program
       is not going to be frozen to produce a Windows executable.

self._repopulate_pool()


Can you please take a look? The previous code (without executor) worked OK, so probably there is an issue in get_default_executor() function etc. I'll need to go back to previous version in case we won't find a problem.

Btw, have you tested your code on Win at all? I know it works good in Linux.

-- You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/paulknysh/blackbox/pull/5#issuecomment-326822236

paulknysh commented 7 years ago

Sounds good. I'm not a good git user yet, not sure how to roll back and create a new branch.

celliern commented 7 years ago

Oh, ok, easy fix. I'm calling mp.Pool() in the function, and in windows the Pool have to be called in the

if __name__="__main"

I will explicitely test the py version. Safer.

paulknysh commented 7 years ago

I didn't get what is the problem exactly. I think mp.Pool() was always called inside of blackbox.py, and never after "name == 'main':" (in the main file).

Anyways, feel free to update the code once you're sure about it. I think you have more expertise on parallelism.

celliern commented 7 years ago

Yes, but the blackbox search is called in the ifmain. But get_default executor is not. I open a new Pull Request for the fix. I have tested for py2 and py3 on windows.

paulknysh commented 7 years ago

One more thing. Is this necessary to have this warning?:

warnings.warn("running on python2 ...

What is the purpose of it? I think it's just confusing.

celliern commented 7 years ago

There is two way. The good practices tell us to log way more info and mute that, the user choose if it wants to display them. Or we can remove them. It's up to you.

2017-09-05 5:45 GMT+02:00 Paul Knysh notifications@github.com:

One more thing. Is this necessary to have this warning?:

warnings.warn("running on python2 ...

What is the purpose of it? I think it's just confusing.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/paulknysh/blackbox/pull/5#issuecomment-327064032, or mute the thread https://github.com/notifications/unsubscribe-auth/ACjzFPCf_HxiTWscssaHffhr3_euTvLNks5sfMPmgaJpZM4PDqEZ .

paulknysh commented 7 years ago

OK. I decided to remove it.