microsoft / FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
https://microsoft.github.io/FLAML/
MIT License
3.91k stars 508 forks source link

Error when ensemble=true #133

Closed stepthom closed 3 years ago

stepthom commented 3 years ago

After upgrading to the newest version of FLAML, I am running into the following error when I set ensemble=True:

Traceback (most recent call last):
  File "search.py", line 229, in <module>
    main()
  File "search.py", line 225, in main
    data_sheet = run_data_sheet(data_sheet, target_col, id_col, data_dir, out_dir, eval_metric)
  File "search.py", line 180, in run_data_sheet
    pipe.fit(X_train, y_train, **automl_settings)
  File "/global/home/hpc3552/autotext/flaml_env/lib/python3.6/site-packages/flaml/automl.py", line 962, in fit
    self._search()
  File "/global/home/hpc3552/autotext/flaml_env/lib/python3.6/site-packages/flaml/automl.py", line 1232, in _search
    **self._state.fit_kwargs)
  File "/global/home/hpc3552/autotext/flaml_env/lib/python3.6/site-packages/sklearn/ensemble/_stacking.py", line 441, in fit
    return super().fit(X, self._le.transform(y), sample_weight)
  File "/global/home/hpc3552/autotext/flaml_env/lib/python3.6/site-packages/sklearn/ensemble/_stacking.py", line 149, in fit
    for est in all_estimators if est != 'drop'
  File "/global/home/hpc3552/autotext/flaml_env/lib/python3.6/site-packages/joblib/parallel.py", line 1054, in __call__
    self.retrieve()
  File "/global/home/hpc3552/autotext/flaml_env/lib/python3.6/site-packages/joblib/parallel.py", line 933, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/global/home/hpc3552/autotext/flaml_env/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/opt/python/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/opt/python/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
TypeError: __init__() got an unexpected keyword argument '_estimator_type'

My call to FLAML:

        automl_settings = {
            "time_budget": search_time,
            "task": 'classification',
            "log_file_name": "{}/flaml-{}.log".format(out_dir, runname),
            "n_jobs": 10,
            "estimator_list": ['lgbm', 'xgboost', 'rf', 'extra_tree', 'catboost'],
            "model_history": True,
            "eval_method": "cv",
            "n_splits": 3,
            "metric": eval_metric,
            "log_training_metric": True,
            "verbose": 1,
            "ensemble": True,
        }

        pipe = AutoML()
        pipe.fit(X_train, y_train, **automl_settings)

This issue goes away if I change ensemble to False.

Here are my environment details:

$ pip list
Package            Version
------------------ --------
catboost           0.26
ConfigSpace        0.4.19
cycler             0.10.0
Cython             0.29.23
FLAML              0.5.6
graphviz           0.16
importlib-metadata 4.6.1
joblib             1.0.1
jsonpickle         2.0.0
kiwisolver         1.3.1
lightgbm           3.2.1
matplotlib         3.3.4
numpy              1.19.5
pandas             1.1.5
Pillow             8.3.1
pip                21.1.3
plotly             5.1.0
pyparsing          2.4.7
python-dateutil    2.8.1
pytz               2021.1
scikit-learn       0.24.2
scipy              1.5.4
setuptools         40.6.2
six                1.16.0
tenacity           8.0.0
threadpoolctl      2.1.0
typing-extensions  3.10.0.0
wheel              0.36.2
xgboost            1.4.2
zipp               3.5.0
$ python --version
Python 3.6.8 :: Anaconda custom (64-bit)
stepthom commented 3 years ago

Note: After downgrading to FLAML version 0.4.1, this error no longer occurs.

stepthom commented 3 years ago

Note: I did a little more testing.

sonichi commented 3 years ago

Thanks for finding this bug. The reason is _estimator_type is passed to the constructor and stored in self.params. The way to fix it is to delete self.params['estimator_type'].