Closed cdeil closed 3 years ago
Could you follow https://github.com/microsoft/FLAML/issues/223#issuecomment-932356442? We haven't been able to reproduce the problem but this debugging mode should help identifying the cause.
I get this:
In [2]: from flaml import AutoML
...: from sklearn.datasets import load_boston
...: # Initialize an AutoML instance
...: automl = AutoML()
...: # Specify automl goal and constraint
...: automl_settings = {
...: "time_budget": 10, # in seconds
...: "metric": 'r2',
...: "task": 'regression',
...: "log_file_name": "boston.log",
...: "verbose": 4,
...: }
...: X_train, y_train = load_boston(return_X_y=True)
...: # Train with labeled input data
...: automl.fit(X_train=X_train, y_train=y_train,
...: **automl_settings)
...: # Predict
...: print(automl.predict(X_train))
...: # Export the best model
...: print(automl.model)
/Users/cdeil/opt/anaconda3/envs/cement/lib/python3.9/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function load_boston is deprecated; `load_boston` is deprecated in 1.0 and will be removed in 1.2.
The Boston housing prices dataset has an ethical problem. You can refer to
the documentation of this function for further details.
The scikit-learn maintainers therefore strongly discourage the use of this
dataset unless the purpose of the code is to study and educate about
ethical issues in data science and machine learning.
In this case special case, you can fetch the dataset from the original
source::
import pandas as pd
import numpy as np
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]
Alternative datasets include the California housing dataset (i.e.
func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
dataset. You can load the datasets as follows:
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
for the California housing dataset and:
from sklearn.datasets import fetch_openml
housing = fetch_openml(name="house_prices", as_frame=True)
for the Ames housing dataset.
warnings.warn(msg, category=FutureWarning)
[flaml.automl: 10-05 21:58:52] {1457} INFO - Data split method: uniform
[flaml.automl: 10-05 21:58:52] {1461} INFO - Evaluation method: cv
[flaml.automl: 10-05 21:58:52] {1509} INFO - Minimizing error metric: 1-r2
[flaml.automl: 10-05 21:58:52] {1546} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'catboost', 'xgboost', 'extra_tree']
[flaml.automl: 10-05 21:58:52] {1776} INFO - iteration 0, current learner lgbm
[flaml.tune.tune: 10-05 21:58:52] {392} INFO - trial 1 config: {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20, 'learning_rate': 0.09999999999999995, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 0.0009765625, 'reg_lambda': 1.0}
[flaml.automl: 10-05 21:58:52] {98} DEBUG - flaml.model - LGBMRegressor(learning_rate=0.09999999999999995, max_bin=255, n_estimators=1,
num_leaves=4, reg_alpha=0.0009765625, reg_lambda=1.0, verbose=-1) fit started
zsh: segmentation fault ipython
Maybe you'll see the error if you add a MacOS conda build using the env I gave above to your CI?
Could it be related to the ethical problems with the Boston dataset? Or it could simply be that after fitting that over and over again in the past years, my CPU got super bored and finally couldn't take it any more and said enough is enough. segfault. Those are the hardest bugs to reproduce. Good luck!
The error happened in fitting the model
LGBMRegressor(learning_rate=0.09999999999999995, max_bin=255, n_estimators=1, num_leaves=4, reg_alpha=0.0009765625, reg_lambda=1.0, verbose=-1)
Does this model's fit()
work in your env without running flaml?
Yes, the issue is just in lightgbm
, independent of flaml
:
import lightgbm as lgb
from sklearn.datasets import fetch_california_housing
X_train, y_train = fetch_california_housing(return_X_y=True)
model = lgb.LGBMRegressor()
model.fit(X_train, y_train)
gives:
% python crash2.py
zsh: segmentation fault python crash2.py
@sonichi - maybe you could notify a lightgbm
dev to this issue to have a look and see if it's reproducible?
Or do you want me to close this issue here and re-file it over in their issue tracker?
@cdeil Thanks for confirming that. I tried to create an issue in https://github.com/microsoft/LightGBM/issues/new?assignees=&labels=&template=BUG_REPORT.md but it requires details that I don't know. Could you please create an issue there?
Reported here: https://github.com/microsoft/LightGBM/issues/4666
I'm using this:
and
and then trying to execute the hello-word regression example in the README I get a hang from JupyterLab or a segfault from ipython:
I didn't try to debug this or other versions, it worked a month ago on Mac on Python 3.8.
Maybe you could extend your CI to also test with conda and try to reproduce / fix the issue?