hyperopt / hyperopt

Distributed Asynchronous Hyperparameter Optimization in Python
http://hyperopt.github.io/hyperopt
Other
7.19k stars 1.05k forks source link

Is it possible to force the algorithm to use intial values for evaluation? #450

Closed dma092 closed 1 month ago

dma092 commented 5 years ago

Is it possible to set intial values to evaluate using hyperopt's TPE? The idea is to feed the algorithm with the baseline's parameters and see if it can improve them.

jarednielsen commented 5 years ago

I tend to try hp.normal() when I already have a set of hyperparameters that work well - what sort of improvement are you thinking of?

PhilipMay commented 5 years ago

@dma092 Yes it is. I think you mean this here:

  points_to_evaluate : list, default None
        Only works if trials=None. If points_to_evaluate equals None then the
        trials are evaluated normally. If list of dicts is passed then
        given points are evaluated before optimisation starts, so the overall
        number of optimisation steps is len(points_to_evaluate) + max_evals.
        Elements of this list must be in a form of a dictionary with variable
        names as keys and variable values as dict values. Example
        points_to_evaluate value is [{'x': 0.0, 'y': 0.0}, {'x': 1.0, 'y': 2.0}]

From here: https://github.com/hyperopt/hyperopt/blob/master/hyperopt/fmin.py#L276

abhishek-ghose commented 5 years ago

I tend to try hp.normal() when I already have a set of hyperparameters that work well - what sort of improvement are you thinking of?

@jarednielsen The problem with using hp.normal() is now the search space is unconstrained. I want to be able to add in some prior information for any sampling scheme (I am not sure if this is what the original commenter intended, but this seems like a useful use-case to me).

Something like hp.uniform('x', 0, 10, bootstrap=[(2, 45), (8, 56)]) where a tuple of the form (p,q) in the bootstrap parameter denotes f(p)=q. Here f is the function to be minimized.

I see two uses for this:

NB: Looking at the link @PhilipMay shared it looks like part of what I suggest above should be possible, albeit using a different parameter - points_to_evaluate. You can ask the optimizer to start with evaluating f on the specific set of points provided in points_to_evaluate. I am not sure why trials=None is needed for this though.

jarednielsen commented 5 years ago

Looking through the source code, you can pass in a Trials object with past runs of the code. Extending @abhishek-ghose 's example, it would be a bit more work than the simple tuple (you'd have to set up the JSON config object), but that would allow setting an arbitrary prior for tpe.suggest. I think we've discovered that this feature is already built-in :)

abhishek-ghose commented 5 years ago

I looked at this a bit more and I think the function generate_trials_to_calculate() helps to make initialization work with a Trials object. This is available from v0.1.1 onwards. I really don't want to do away with Trials, which seemingly I would need to if I used points_to_evaluate (see my previous comment); its very handy for understanding what the optimizer does.

Here's some sample code that shows how to use generate_trials_to_calculate(). My objective function objective() just squares a scalar input, which we want to minimize within some range of the input parameter [LOW, HIGH]. The function bootstrap() takes as input init_size - the number of points you want the optimizer to evaluate before it begins to call suggest(). An optional boolean parameter plot_bootstrap_points decides if you want to plot these bootstrap points on the final curve. It kind of gets cluttered so I prefer setting this flag to False.

The bootstrap points are uniformly sampled in the range [LOW, HIGH].

import numpy as np
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from hyperopt.fmin import generate_trials_to_calculate
from matplotlib import pyplot as plt
import seaborn as sns; sns.set()
LOW, HIGH = -10, 10

def objective(x):
    return {
        'loss': x ** 2,
        'status': STATUS_OK,
        }

def bootstrap(init_size, plot_bootstrap_points=False):
    trials = Trials()
    if init_size > 0:
        # as initial values pick points uniformly on the x-axis; we need this as a dict
        init_vals = [{'x': i} for i in np.linspace(LOW, HIGH, init_size)]

        # this generates a trial object that can be used to bootstrap computation for the optimizer
        trials = generate_trials_to_calculate(init_vals)

    # since we've a bootstrapped trials object, the number of function evals would be init_size + max_evals
    best = fmin(objective,
                space=hp.uniform('x', LOW, HIGH),
                algo=tpe.suggest,
                max_evals=100,
                trials=trials)
    print "Best x: %0.04f" % (best["x"],)

    # get the bootstrap data for plotting
    bootstrap_points = [t['misc']['vals']['x'][0] for t in trials][:init_size]
    bootstrap_losses = trials.losses()[:init_size]
    bootstrap_plot_data = np.asarray(sorted(zip(bootstrap_points, bootstrap_losses)))

    # get data reg points not belonging to the bootstrap
    tpe_suggest_points = [t['misc']['vals']['x'][0] for t in trials][init_size:]
    tpe_suggest_losses = trials.losses()[init_size:]
    tpe_suggest_plot_data = np.asarray(sorted(zip(tpe_suggest_points, tpe_suggest_losses)))

    fig = plt.figure()
    ax = fig.add_subplot(111)

    # plot the objective fn
    temp = np.linspace(LOW, HIGH, 1000)
    ax.plot(temp, [objective(x)['loss'] for x in temp])

    # plot the bootstrap values
    if plot_bootstrap_points and init_size > 0:
        ax.plot(bootstrap_plot_data[:, 0], bootstrap_plot_data[:, 1], marker='o', ls='', color='gray' )

    # plot the new trial values
    ax.plot(tpe_suggest_plot_data[:, 0], tpe_suggest_plot_data[:, 1], 'ro')

    # extend the y axis so that nothing gets cut off
    ax.set_ylim(bottom=-3)
    ax.set_title('bootstrap size=%d' % (init_size,))
    plt.show()

if __name__ == "__main__":
    bootstrap(10, False)

A sample call to bootstrap() is shown.

One way to see this works is to try out different values for the init_size: if the bootstrap sample is small, we would expect the optimizer to work almost normally, exploring most of the space. However, if the sample size is large, we would expect the optimizer to know which regions are most promising and limit its search to those regions.

The output plot shows the points the optimizer tests in red. I tried init_size values of 10, 100, 1000. max_evals was held fixed at 100. We see that larger values of init_size do indeed focus the optimizer search near the minima.

10 100 1000

Many thanks to Ethan Brown for introducing me to generate_trials_to_calculate() in this thread.

trevorwelch commented 5 years ago

Thanks for this detailed solution @abhishek-ghose , it looks like you've set out to solve a problem I've been thinking about for a while now!

That said, I'm unable to implement it myself, I wonder if you might have a thought as to why. I suspect I'm not understanding the type of input that generate_trials_to_calculate should be receiving.

I'm hoping to pass a list of dicts of good "starting point" parameters for tpe.suggest to work from.

Here's the relevant bits of my code:

params = {
    'param_1': hp.quniform('param_1', 20, 160, 20),
    'param_2': hp.quniform('param_2', 5, 100, 5),
    'param_3': hp.quniform('param_3', 1, 3, 1),
    'param_4': hp.quniform('param_4', 0.2, 3.0, 0.2),
    'param_5': hp.choice('param_5', [0,1]),
    'param_6': hp.choice('param_6', [0,1]),
    'param_7': hp.choice('param_7', [0,1]),
    'param_8': hp.choice('param_8', [0,1]),
    'param_9': hp.choice('param_9', [0.6, 0.7, 0.8])
}

print(init_vals[0]) # init_vals is a list of dicts of hyper parameters I like, see output below
trials = generate_trials_to_calculate(init_vals)
print(trials)

best = fmin(
            fn=run_the_strategy, 
            space=params, 
            algo=tpe.suggest, 
            max_evals=100, 
            trials=trials,
            points_to_evaluate=init_vals
            )

And my output/error:

{'param_1': 140.0, 'param_9': 0.6, 'param_8': 0, 'param_5': 1, 'param_6': 1, 'param_7': 0, 'param_2': 90.0, 'param_3': 1.0, 'param_4': 2.0}

<hyperopt.base.Trials object at 0x113175828>

Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/pyll/base.py", line 868, in rec_eval
    int(switch_i)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'type'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "optimizing-the-strategy.py", line 609, in <module>
    points_to_evaluate=init_vals
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 367, in fmin
    return_argmin=return_argmin,
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/base.py", line 635, in fmin
    return_argmin=return_argmin)
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 385, in fmin
    rval.exhaust()
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 244, in exhaust
    self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 218, in run
    self.serial_evaluate()
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 137, in serial_evaluate
    result = self.domain.evaluate(spec, ctrl)
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/base.py", line 839, in evaluate
    print_node_on_error=self.rec_eval_print_node_on_error)
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/pyll/base.py", line 870, in rec_eval
    raise TypeError('switch argument was', switch_i)
TypeError: ('switch argument was', <class 'hyperopt.pyll.base.GarbageCollected'>)
abhishek-ghose commented 5 years ago

@trevorwelch You're welcome!

I think the errors you see are probably a result of:

  1. The way you've declared your space. If this is an optimization in 9 dimensions, you need a list instead of a dict.
  2. This bit is weird - I think initializing a choice space works only with the index and not with the choice values!

Here's a modified version of your code:

from hyperopt import fmin, tpe, hp, STATUS_OK
from hyperopt.fmin import generate_trials_to_calculate

def run_the_strategy(args):
    print "Current arguments:", args
    return {
        'loss': sum([i*i for i in args]),
        'status': STATUS_OK
    }

params = [
    hp.quniform('param_1', 20, 160, 20),
    hp.quniform('param_2', 5, 100, 5),
    hp.quniform('param_3', 1, 3, 1),
    hp.quniform('param_4', 0.2, 3.0, 0.2),
    hp.choice('param_5', [0,1]),
    hp.choice('param_6', [0,1]),
    hp.choice('param_7', [0,1]),
    hp.choice('param_8', [0,1]),
    hp.choice('param_9', [0.6, 0.7, 0.8])
]

init_vals = [{'param_1': 140.0, 'param_2': 90.0, 'param_3': 1.0,
              'param_4': 2.0, 'param_5': 1, 'param_6': 1,
              'param_7': 0, 'param_8': 0, 'param_9': 0}]

trials = generate_trials_to_calculate(init_vals)

best = fmin(
            fn=run_the_strategy,
            space=params,
            algo=tpe.suggest,
            max_evals=10,
            trials=trials
            )

This runs, and gives me the following output (from the objective function):

Current arguments: (140.0, 90.0, 1.0, 2.0, 1, 1, 0, 0, 0.6)
Current arguments: (80.0, 70.0, 3.0, 3.0, 0, 1, 1, 0, 0.7)
Current arguments: (120.0, 90.0, 2.0, 0.6000000000000001, 1, 1, 1, 1, 0.8)
Current arguments: (140.0, 20.0, 2.0, 2.4000000000000004, 0, 0, 1, 0, 0.8)
Current arguments: (120.0, 65.0, 1.0, 0.2, 1, 0, 0, 0, 0.7)
Current arguments: (20.0, 45.0, 1.0, 0.6000000000000001, 1, 1, 1, 1, 0.8)
Current arguments: (40.0, 10.0, 2.0, 2.4000000000000004, 0, 1, 1, 1, 0.8)
Current arguments: (40.0, 25.0, 2.0, 0.6000000000000001, 1, 1, 1, 1, 0.8)
Current arguments: (60.0, 10.0, 2.0, 3.0, 1, 0, 0, 0, 0.6)
Current arguments: (120.0, 55.0, 2.0, 2.8000000000000003, 1, 1, 1, 1, 0.7)
Current arguments: (160.0, 10.0, 3.0, 1.8, 1, 1, 0, 0, 0.8)

As you would note:

  1. There are 11 lines printed although max_evals=10. The first line are the values we initialized with (except param_9 - see next point). So we know the initial value in trials is indeed being used.
  2. This is the weird bit I mentioned - you'd see for param_9 I am initializing with 0 which is not a legal value; however, the first printed line shows param_9 is 0.6! It seems initialization works with only the index. If you try initializing with 'param_9': 0.6the code throws an error.

Btw you don't need points_to_evaluate since you're initializing trials. I accidentally left that in my code - I'll remove it [done].

trevorwelch commented 5 years ago

Thanks again for your very helpful response. With some tweaks I got it running on my actual code 🎉

Another interesting thing to note: when you bootstrap a parameter space like this, the parameters you pass to the function you're hyperopt'ing will now be a tuple instead of a dict (with my use of hyperopt, it's always been a dict, although perhaps this isn't always the case?). For example, where previously I could pass params['param_1'] internally to my function in order to access the parameter value 160, now I need to keep track of indices and param names and use params[0] to access the parameter value at param_1.

This bit is weird - I think initializing a choice space works only with the index and not with the choice values!

Yes indeed, in fact, another error occurs related to use of choice when hyperopt "switches over" from the initialized params to the search space in your evals. My script was throwing this error after the bootstrap evals had all run:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 390, in fmin
    show_progressbar=show_progressbar,
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/base.py", line 639, in fmin
    show_progressbar=show_progressbar)
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 409, in fmin
    rval.exhaust()
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 262, in exhaust
    self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/fmin.py", line 211, in run
    self.rstate.randint(2 ** 31 - 1))
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/tpe.py", line 900, in suggest
    print_node_on_error=False)
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/pyll/base.py", line 913, in rec_eval
    rval = scope._impls[node.name](*args, **kwargs)
  File "/anaconda3/lib/python3.6/site-packages/hyperopt/pyll/base.py", line 1076, in bincount
    return np.bincount(x, weights, minlength)
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

I solved this by swapping all of my choice for quniform. Fixes the error and you end up with the same results. It seems like the best thing is to avoid use of choice if you're going to initialize a parameter space via generate_trials_to_calculate.

abhishek-ghose commented 5 years ago

@trevorwelch Great that it works for you now! I have been using Python 2.7 - I forgot it to mention it before - but I am not sure if that changes the argument passing semantics.

On a different note, choice and quniform satisfy quite different distributional needs IMO. quniform is ideal when the input comes from a discrete space where smoothness assumptions still hold true e.g. you could make the assumption that in the space [0,1, 2, ..., 98, 99, 100], the function doing well at {98, 99} implies it is likely to do well at 97 too. With choice you find the utility of each value in your space independently over multiple evals - in theory, both could give you the same solution, but for a somewhat smooth function choice would take longer to get to optimality.

pangjac commented 2 years ago

For anyone who met the same issue as @trevorwelch and I did (TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe') : this error probably occurs when you have below kind of initialization where you give a parameter which is set to choice with a float number 0.5. The way the parameter with set to choice works is the given init value is an index of the possible values from the value range [2,5], thus you can only give int ( index is, of course, an integer). So the way to properly set an initial value for a param which is set tochoiceis to give the desired index, for example,init_vals = [{ .... 'param_8': 1,}]. In this way, the initial value pfparam_8will be set to[2,5][1]which is5` .

Another minor issue is that you should ensure the names are matched, for example in params if you define 'num_estimators':hp.randint('num_estimators', 1000) + 500, you should set init_vals with exactly the same name, you cannot use, say n_estimators when init it.

Hope this helps.

Example

params = [
    hp.quniform('param_1', 20, 160, 20), ...
    hp.choice('param_8', [2,5]) # noticed here param_8 is defined with `choice`.
]

init_vals = [{'param_1': 140.0, ..., 
                  'param_8': 0.5,  # ----> error throw here, should be an int
    }]

trials = generate_trials_to_calculate(init_vals)
github-actions[bot] commented 2 months ago

This issue has been marked as stale because it has been open 120 days with no activity. Remove the stale label or comment or this will be closed in 30 days.