hyperopt / hyperopt-sklearn

Hyper-parameter optimization for sklearn
hyperopt.github.io/hyperopt-sklearn
Other
1.58k stars 271 forks source link

GaussainRandomProjection and SparseRandomProjection #114

Closed adodge closed 5 years ago

adodge commented 5 years ago

Noticed a TODO and I happened to want to use random projection anyway.

I wish there was a way to automatically avoid testing values of n_components that are larger than the input space, but I don't see an elegant way to do it. Changing the parameter from an count to a [0,1] ratio would do it, but then we're deviating from the interface to the sklearn object, which might be confusing.

bjkomer commented 5 years ago

Looks good! Only thing I would change is the naming, to be more consistent with the other functions. e.g. use sparse_random_projection instead of sparserandomprojection

I'm not sure of the best way of limiting n_components either. I feel like in the most common use case you would know the size of your input space and can pass your own distribution. If users don't know how to do that, hopefully hyperopt will figure out quick enough that larger values than the input produce bad results and stop searching that part of the space (will waste some evals though).

Here's an example test I put together that only chooses n_components within the correct range:

def test_gaussian_random_projection(self):
    n_components = scope.int(hp.quniform(
        'preprocessing.n_components', low=1, high=8, q=1
    ))
    model = hyperopt_estimator(
        classifier=components.gaussian_nb('classifier'),
        preprocessing=[
            components.gaussian_random_projection(
                'preprocessing',
                n_components=n_components,
            )
        ],
        algo=rand.suggest,
        trial_timeout=5.0,
        max_evals=5,
    )

    X_train = np.random.randn(1000, 8)
    Y_train = (self.X_train[:, 0] > 0).astype('int')
    X_test = np.random.randn(1000, 8)
    Y_test = (self.X_test[:, 0] > 0).astype('int')

    model.fit(X_train, Y_train)
    model.score(X_test, Y_test)
adodge commented 5 years ago

Whoops! Good points.