kegl commented 5 years ago

We'll be starting to add a hyperopt/automl feature to ramp-workflow. The goal is to make it easy to convert a submission into a template with hyperparameter placeholders, and to add a hyper-config file that defines the values to try for each hyperparameter. Then rampwf will interface with various hyperopt engines (implementations of grid search, random, bayesopt), run the optimization, and output 1) a table on score(s) vs hyperparameter value combinations and 2) a valid submission where the placeholders will be replaced by the best hyperparameter. We are planning to use jinja, which means that, for example, the python code

class Classifier(BaseEstimator):
    def __init__(self):
        self.clf = Pipeline([
            ('imputer', Imputer(strategy='median')),
            ('classifier', LogisticRegression(C=1.0))
        ])

will be replaced by

class Classifier(BaseEstimator):
    def __init__(self):
        self.clf = Pipeline([
            ('imputer', Imputer(strategy='{{ impute_strategy }}')),
            ('classifier', LogisticRegression(C={{ logistic_C }}))
        ])

and the config json file will specify values [median, mean] for strategy and [0.001, 0.01, 0.1, 0.5, 0.9, 0.99, 1.0] for logistic_C. The user will then call e.g. ramp_hyperopt --submission ... --strategy random. In addition, for each placeholder we will specify a default, so if ramp_test --submission is called on a templetized submission, it will use these default values.

agramfort commented 5 years ago

pb is that this makes .py not valid python file. It's not easy to debug. I would much prefer to have as contract that all parameters are defined in the constructor with a get_params / set_params like sklearn.

my 2c

kegl commented 5 years ago

Yes, that's an option. The problem is that it adds a constraint on what constitutes a submission. Some submissions have several classes, we even had elements with standalone functions. It also makes it python-specific. The hyperopt engine would have to know about what's in the submission files, whereas with the jinja template it can remain agnostic. The last remark with a default value is to make it easy to convert the template into a valid python file (if needed, e.g. for debugging).

The typical workflow, in my mind, is that I develop first a submission, test it with ramp_test, then specify hypers to optimize. The validity of the templetized submission can be checked by running ramp_test on the templetized files (with default replacement) before possibly heavy hyperopt. Reporting errors on some hyper combinations can also be part of the output.

kegl commented 5 years ago

Another option: we take a valid python submission and textually mark the hypers (with [line_start, column_start, line_end, column_end]) in the config file. The advantage is that this keeps the submission files unchanged. The disadvantage is that if someone slightly modifies the submission files, the text markers have to be rewritten.

agramfort commented 5 years ago

make people use a Bunch object where they write all there params at the top of the files

params = Bunch(param1='bla', ...) ... .... ... params.param1 ...

I would really stick to valid Python code

kegl commented 5 years ago

OK, can you write the (pseudocode of the) hyperopt loop using this version? Below are the steps to fill. What we want to reuse is ramp_test (or assert_submission) which reads valid python submissions with hyper values specified, inserts them into the workflow, and runs the CV specified in problem.py. In my mind you'll still need template handling, it'd be just more complex since we'd need to parse the python files.

1. Read the config file (or the python files with Bunches), set the hyperparameter grid.
2. Choose a set of hyper values (delegated to the hyperopt engine) given the saved scores.
3. Create a valid submission (with hyper values replacing placeholders).
4. Run  `assert_submission`, save scores or error.
5. If we have more resources, GOTO 2.

In particular, how would you do 3. with Bunches?

kegl commented 5 years ago

OK, my attempt. We could do this with dictionaries:

params = {
    'impute_strategy': 'median',
    'logistic_C': 1.0
}

class Classifier(BaseEstimator):
    def __init__(self):
        self.clf = Pipeline([
            ('imputer', Imputer(strategy=params['impute_strategy'])),
            ('classifier', LogisticRegression(C=params['logistic_C']))
        ])

this would be a valid submission that can be tested by ramp_test. Now, in the hyperopt loop we'd need to recognize where in the file the dictionary is defined (parse it?), and replace it with another dictionary with new hyper values. This is where I'm stuck, how would you do this?

agramfort commented 5 years ago

make sure it's always called params and pb is solved

amsehili commented 5 years ago

I've started testing an implementation of a solution like the one suggested by @agramfort. All you need to do is make the hyper-parameters explicitly appear in your estimator's __init__.

To make a submission out of the best parameters, one can ask the CLI tool for hyper-parameter tuning to pickle the best model and/or the best config. We can add an option to the submission script to set the hyper-parameters values from a config file.

Forcing the use of explicit keyword arguments for hyper-parameters to be optimized would only improve the readability of the code.

jorisvandenbossche commented 5 years ago

I agree with @agramfort it is good to keep it a python file. And I also think it would be nice to re-use the sklearn pattern of setting parameters, however, that would need a rework of ramp-workflow testing machinery to be able to work with parameters. So I understand that an approach that generates new submission files and just uses the existing ramp_test_submission machinery to evaluate that submission, might indeed be the best way to go about it.

And about some specified syntax that ramp can recognize and adapt: I am not sure there is any added value in using a python dict params compared to simply defining variables in a specified location in the file, eg something like:

# RAMP START HYPERPARAMETERS
impute_strategy = 'median'
logistic_C = 1.0
# RAMP END

I think something like this is easier to parse and modify than the dictionary, and also very explicit (the naming above of the comments is of course just an idea, it can be whatever we want).

We could even combine the parameter defaults with the hyperopt range, if we want:

# RAMP START HYPERPARAMETERS
impute_strategy = 'median'  # opt: ['median', 'mean']
logistic_C = 1.0  # opt: [0.001, 0.01, 0.1, 0.5, 0.9, 0.99, 1.0]  
# RAMP END

agramfort commented 5 years ago

that's neat too

albertcthomas commented 5 years ago

+1 to keep it as a python file for debugging

glemaitre commented 5 years ago

Technically, I would prefer to modify the Workflow and the cross-validation loop. I think that whenever possible it should fall-back on scikit-learn avoiding to reinvent the wheel. However, it would be costly as development and most probably breaking some backward compatibility.

So if engineering time is an issue, I am +1 for the solution proposed by @jorisvandenbossche

albertcthomas commented 5 years ago

In case it can be useful, in the sacred package you can define configuration parameters either by decorating a config function (which basically collects all variables of the local scope of the function) or by a dictionary or by a config file (json or yaml). (This might be overly complicated for what we want to achieve here)

kegl commented 5 years ago

One more argument for @jorisvandenbossche's solution. The list of values is very informative since it tells to the other data scientists what hypers were tried. In case of transfer (trying the model on another data set), hyperopt could be rerun on the same list of values without modifying anything. And since the list is in the submission file, the list will be submitted to the server with the submission. In case we put the list into a separate file, that file will not be automatically submitted to the server (unless we require it explicitly, but that would make no sense for those submissions that didn't use hyperopt), so the list of values is lost.

kegl commented 5 years ago

I started a new branch hyperopt. For now I just added a titanic test kit with @jorisvandenbossche's interface. We can add other interface candidates, and even implement these alternatives to see which gets traction. This would actually force use to properly modularize the hyperopt module (separate the interface, the generic hyperopt loop, and the hyperopt engines we'll hook up).

https://github.com/paris-saclay-cds/ramp-workflow/pull/177

kegl commented 5 years ago

I have a first random engine working. You can pull from the hyperopt branch, and also pull in your titanic kit from hyperopt branch. You should run python setup.py develop in your rampwf library so it installs the new script, then run

ramp-hyperopt --submission starting_kit_h --n_iter 10

in titanic. The run will create a new submission (so we don't overwrite starting_kit_h) starting_kit_h_hyperopt. The best model will be saved as a submission in starting_kit_h_hyperopt. In addition, a summary table (with tried hyper values, scores, and score stds across folds) will be saved in submissions/starting_kit_h_hyperopt/hyperopt_output/summary.csv.

Please check whether you are happy with the interface (how to specify hypers, the command line script, and the output of hyperopt).

Here are some details about the interface and implementation:

Hyperparameters are specified like this:


from rampwf.hyperopt import Hyperparameter

RAMP START HYPERPARAMETERS

logreg_C = Hyperparameter( dtype='float', default=1.0, values=[0.01, 0.1, 0.9, 1.0]) imputer_strategy = Hyperparameter( dtype='object', default='median', values=['mean', 'median'])

RAMP END HYPERPARAMETERS


We need to define the type and the value (default), as well as the values to try.
2. The experiment will run the setup in `problem.py`. At this point it is a random search with memory, that is, it will not try the same value combination as long as all combinations have not been tried (so it covers the the full search space as grid search if `n_iter` is bigger then the number of value combinations, 16 in the titanic example). The engine has the decision granularity of running experiments on folds which means we can develop sophisticated strategies, for example trying smaller folds first, not running all the folds on all the value combinations, etc. At this time the random search runs all the folds when a value combination is tried, so the number of fold in `scores.csv` is always a multiple of 8 or each value combination in the titanic example.

There are of course a lot of tests and documentation to be done, but please try the beta and let us know especially if you have comments on the usage.

paris-saclay-cds / ramp-workflow

hyperopt #176

RAMP START HYPERPARAMETERS

RAMP END HYPERPARAMETERS