hyperopt / hyperopt-sklearn

Hyper-parameter optimization for sklearn
hyperopt.github.io/hyperopt-sklearn
Other
1.58k stars 271 forks source link

Fix to crossvalidation and test #113

Closed adodge closed 5 years ago

adodge commented 5 years ago

Using the CV options without use_partial_fit causes an error because cv_n_iters gets filled with None, which can't be passed to max(). This is a patch that fixes this, along with a test that used to fail, but now doesn't.

adodge commented 5 years ago

Relates to issue #92

bjkomer commented 5 years ago

Thanks for the PR! I tried running this on my machine, and the test seems to pass with and without the change. Is there something else I'm missing that will allow me to reproduce the error? I've been using sklearn version 0.19, upgraded to 0.20 to see if that made a difference, but it still passes. I've also tried a few different algorithms as the classifier, but no luck.

Also, if the error is related to partial_fit, I think it would make more sense to test it with an algorithm that supports partial_fit, such as sgd instead of random_forest. That algo also seems to run faster, which is nice when running a full suite of tests.

adodge commented 5 years ago

Hmm interesting! Switching to sgd is a good point. Not sure how that ended up being a random forest. The test wasn't meant to be specific to any particular classifier. The error is happening when we try to call .max() on a numpy array with Nones in, so the version of numpy might be important. Here's my pip list:

atomicwrites (1.2.1)
attrs (18.2.0)
decorator (4.3.0)
future (0.16.0)
hpsklearn (0.0.3, /home/adodge/dev/oss/hyperopt/hyperopt-sklearn)
hyperopt (0.1.1)
joblib (0.12.5)
lockfile (0.12.2)
more-itertools (4.3.0)
networkx (2.2)
nose (1.3.7)
numpy (1.15.1)
pandas (0.23.4)
pip (9.0.1)
pluggy (0.7.1)
py (1.6.0)
pymongo (3.7.1)
pytest (3.8.0)
python-dateutil (2.7.3)
pytz (2018.5)
scikit-learn (0.19.2)
scipy (1.1.0)
setuptools (28.8.0)
six (1.11.0)
skdata (0.0.4, /home/adodge/dev/oss/hyperopt/skdata/build/py3k)
wheel (0.29.0)

I changed the classifier to SGD and ran test_estimator.py with and without the fix:

With (https://github.com/adodge/hyperopt-sklearn/commit/ed38c602de55facf045d7eb64f1f83aaa0d533d4): test_results2.txt

Without (https://github.com/adodge/hyperopt-sklearn/commit/1612c0d85defd15fe0d6462d6b651ed009dc5bdf): test_results.txt

(Which unfortunately doesn't actually show the traceback to the line in question, but I found it by bodging in calls to traceback inside the subprocess.)

I'll try this with different versions of numpy when I get a chance.

adodge commented 5 years ago

Different versions of numpy don't seem to matter. I tried all the major versions back to 1.11 and they all fail the test for me without the fix. How odd!

bjkomer commented 5 years ago

Just tried making a fresh virtualenv, and now its working correctly (i.e. old code fails, new code doesn't). Must be something weird with my environment.

Great work! Going to merge this now