paulknysh / blackbox

A Python module for parallel optimization of expensive black-box functions
MIT License
439 stars 60 forks source link

Packaged blackbox using a setup.py #24

Closed tchar closed 3 years ago

tchar commented 3 years ago

Hello,

I found your module pretty useful so I tried to package it under a setup.py so it can be installed using python setup.py install

I basically named the package as blackboxfunctions. If you don't like the name you can just change it or I make another pull request with another name that you like. I could not name it blackbox because a package like that already exists so there would be conflicts in case you want to upload this to PyPI

I also moved blackbox.py under blackboxfunctions directory and created an init.py that basically imports everything so it can be used as

import blackboxfunctions as bb

Also I created a .gitignore

You can check the information of the module in setup.py. I specified Paul Knysh as the author and used the email I found on his github page for the author_email field (not sure if you like this).

Lastly I changed the way blackbox prints to the output by using python's logging module. So for the error messages I used logger.error, for warnings logger.warning and for info messages logger.info. By doing this the user can actually control the output levels of the blackbox module by using logging.

I assumed that you would like to keep functionality for python 2.7 even though it is being deprecated, so I made sure that it works for both versions. I tested the setup.py and the module itself against python 2.7 and python 3.8 and they work as expected.

Also I have a branch in my blackbox repository named logging (https://github.com/tchar/blackbox/tree/logging) with the only change to be the usage of python's logging module without rearranging the structure, no setup.py and no renaming. So if you don't want the changes made for setup.py let me know so I make a pull request just for the logging change.

Please let me know what you think of the changes

Cheers, tchar

paulknysh commented 3 years ago

Thanks, this is great and something I wanted to do for a while!

Few comments:

  1. I'll probably change the name of package to something shorter
  2. I'm thinking about dropping python 2 support actually, since it's already outdated

Questions:

  1. Have you tested this for bugs?
  2. Is this ready to be published to PyPI?
tchar commented 3 years ago

Hello,

Sorry for taking so long to respond, for some reason I did not get informed of a new message in my pull request.

So my comment is long sorry for that :)

TLDR

I added tests. Name will change from blackbox to blackboxfunctions for conflict reasons. If you want to change the name, you need some changes before publishing to PyPI or tell me your prefered name and I do it for you. It is ready for PyPI if you keep the name.

Changes

So as you can see I made some changes.

How to run the tests

install the test requirements manually or by using the following command in the root directory (note the dot before develop)

pip install -e .[develop]

This will install numpy, scipy and pytest required for testing. Then simply run

pytest

And it automatically runs all the tests.

As you can see all the tests pass

Suggestions & Changes

If you choose to accept the pull request the module will be renamed from blackbox to blackboxfunctions. You can either change it yourself with the appropriate changes in the folder name, setup.py (packagename), README.md (example), blackbox.py (logger name) and tests/utils.py (module import) or tell me your prefered name and I change it for you. It's up to you.

I would suggest some changes in that order but this is up to you.

  1. Change the way the function f that is called from search_min with a small modification by just unpacking the parameters with * instead of passing them as a list. It is more pythonic and logical and will enable someone to write their function like this:
    def f(x, y, z):
    return x**2+x**3+z

    instead of this

    def f(params):
    return params[0]**2+params[1]**3+params[2]
    1. Return the whole history of search_min along with writing to a file. Someone who uses this module may want access to the history during runtime and reading from the csv file created is not a good approach, but it is the only one available right now. For that you could:
      • Return a list of parameters as the history and a user can get the best parameters by taking the last element of that list
      • Wrap the module around a class so you can enable more features and make it more versatile like this for example:
        from blackboxfunctions import BlackboxFunctions
        bb = BlackboxFunctions(arguments)
        bb.search_min()
        bb.get_best_params()
        bb.get_history()

For now I could make the change I propose in 1 because it is pretty simple.

PyPI

The password, username etc for PyPI it is for you to set them up because you will be maintaining I suppose.

Since this is going to be the first publish on PyPI, I suggest that you use https://test.pypi.org/ instead of https://pypi.org/ for the first time in order to check if everything works properly and then use https://pypi.org/.

From my experience I have seen problems like PyPI not processing your README.md correctly (long_description field) although I specified it is a "text/markdown" .

Other than that I think it is ready for PyPI.

paulknysh commented 3 years ago

Thanks for these changes!

Few requests/comments:

  1. I'm not sure I like the current format for logging. One thing which is definitely missing right now is that INFO level logs are not shown by default (WARNING is the default level looks like). I think INFO level is extremely informative for someone who is spending multiple minutes/hours and wants to watch those evaluations live.

I propose the following format for logging (in blackbox.py):

logging.basicConfig(format = '%(name)s %(message)s %(asctime)s', datefmt="%Y-%m-%d %H:%M:%S")
_LOGGER = logging.getLogger('[blackbox]')
_LOGGER.setLevel(logging.INFO)

This format also includes time stamp for each message, which I think is very useful. So, we should also remove the following lines from log messages, as they become redundant:

' @ ' + str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

If you agree with these changes -- please include them, and also update the tests.

  1. I agree with changing the format for blackbox function to be:
def f(x, y, z):
    return x**2+y**3+z

The only concern is that domain is still specified as a list:

              domain = [ 
                                [-10., 10.],
                                [-10., 10.]
                                ],

which now creates a confusion - as it doesn't provide 1-to-1 mapping between x, y, z and their ranges, so we still need to remember the order.

An ideal solution would be to specify domain as a dictionary {x : [-10, 10], y : ...} but keys should be characters, not just unassigned variables.. I'm not sure how to resolve this nicely. See if you can think of something.

  1. As for returning the whole history of evaluations during runtime, maybe not the top priority for now, but definitely something that's good to have, and should be pretty easy to add.
tchar commented 3 years ago

Hello,

Before I make any changes I want to focus on your first comment but I will answer all of them

  1. a. If you do

    logging.basicConfig(format = '%(name)s %(message)s %(asctime)s', datefmt="%Y-%m-%d %H:%M:%S")

    Inside the blackbox.py module then you almost force the user to use your formatting. What I mean by that is if the user creates a main.py and wants to use blackbox then if he tries to specify his own format like this:

    import logging
    import blackboxfunctions as bb
    logging.basicConfig(format = '....')

    It will not work because the format in blackbox has overriden his format. He will have to use it like this in order to specify the format he wants

    import logging
    logging.basicConfig(format = '....')
    import blackboxfunctions as bb

    I think the best scenario is to remove the custom logging messages from inside the blackbox as you propose and put in the README.md the format you propose and let the user decide if he wants to use it or not. This is up to you though. I just found it tricky to specify my logging format in this case and I thought I should mention it.

I agree with the _LOGGER.setLevel(logging.INFO) to be the default level.

  1. b. I propose the following format for the logging logging.basicConfig(format = '%(name)s %(asctime)s %(message)s', datefmt="%Y-%m-%d %H:%M:%S") Meaning you put first the time and then the message. Usually the message goes to the end when using logging.

The reason for that is that when you actually print the logging message with the time after the message it will look like this:

blackboxfunctions evaluating batch 1/10... 2020-11-08 04:07:00

Whereas if you put it in the end it looks like this.

blackboxfunctions 2020-11-08 04:06:36 evaluating batch 1/10...

It's up to you

  1. To resolve the issue of naming the variables is easy. The hard part is to make the Pool executor call the function with expanded arguments and not like a list. I was able to make it work for Python 3 using something like function decorators but not for Python 2.

  2. This should not be that hard. If you are interested I can implement that with my next commit if I have time.

Let me know what you think of 1 in order to do the last commit so you can pull.

Cheers, tchar

paulknysh commented 3 years ago

Thanks for your answers. The next 1-2 weeks are kind of busy for me, so I won't have time to look into the code. I should be able to review and hopefully merge after.

paulknysh commented 3 years ago

Hi, sorry, I was trying to merge your changes earlier today, but somehow there were issues with uploading package to PyPi, seemed like a problem with setup.py. I was eventually able to find a simpler template for setup.py at https://packaging.python.org/tutorials/packaging-projects/, and it worked well. I already used that, you can find a package at https://pypi.org/project/black-box/

Also, FYI, I decided to drop Python2 support, since I think it's not useful as we're approaching the end of 2020.

I didn't include your updated logging yet, but feel free to update existing (or send a new) Pull Request with those changes, since there is a conflict right now. Also, I saw you added a lot of tests for logging, which is good, but probably not very critical. If you have a chance to add some more tests for core logic, that would be definitely useful. Any other changes you have in mind (i know we discussed possibility of passing named arguments into objective function instead of a list) -- let me know.

Thanks!