Hang or segfault with Python 3.9 MacOS conda

cdeil commented 3 years ago

I'm using this:

 % cat case_study/environment.yml 
name: cement

channels:
- conda-forge
- nodefaults

dependencies:
- python==3.9
- jupyterlab==3.1
- pandas==1.3
- scikit-learn==1.0
- missingno
- matplotlib
- seaborn==0.11
- dtale==1.56
- pandas-profiling
- sweetviz==2.1
- pip
- pip:
    - flaml

and

conda env create -f environment.yml

and then trying to execute the hello-word regression example in the README I get a hang from JupyterLab or a segfault from ipython:

[flaml.automl: 10-05 19:28:26] {1432} INFO - Evaluation method: cv
[flaml.automl: 10-05 19:28:26] {1478} INFO - Minimizing error metric: 1-r2
[flaml.automl: 10-05 19:28:26] {1515} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'catboost', 'xgboost', 'extra_tree']
[flaml.automl: 10-05 19:28:26] {1748} INFO - iteration 0, current learner lgbm
zsh: segmentation fault  ipython

I didn't try to debug this or other versions, it worked a month ago on Mac on Python 3.8.

Maybe you could extend your CI to also test with conda and try to reproduce / fix the issue?

sonichi commented 3 years ago

Could you follow https://github.com/microsoft/FLAML/issues/223#issuecomment-932356442? We haven't been able to reproduce the problem but this debugging mode should help identifying the cause.

cdeil commented 3 years ago

I get this:

In [2]: from flaml import AutoML
   ...: from sklearn.datasets import load_boston
   ...: # Initialize an AutoML instance
   ...: automl = AutoML()
   ...: # Specify automl goal and constraint
   ...: automl_settings = {
   ...:     "time_budget": 10,  # in seconds
   ...:     "metric": 'r2',
   ...:     "task": 'regression',
   ...:     "log_file_name": "boston.log",
   ...:     "verbose": 4,
   ...: }
   ...: X_train, y_train = load_boston(return_X_y=True)
   ...: # Train with labeled input data
   ...: automl.fit(X_train=X_train, y_train=y_train,
   ...:            **automl_settings)
   ...: # Predict
   ...: print(automl.predict(X_train))
   ...: # Export the best model
   ...: print(automl.model)
/Users/cdeil/opt/anaconda3/envs/cement/lib/python3.9/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function load_boston is deprecated; `load_boston` is deprecated in 1.0 and will be removed in 1.2.

    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this case special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows:

        from sklearn.datasets import fetch_california_housing
        housing = fetch_california_housing()

    for the California housing dataset and:

        from sklearn.datasets import fetch_openml
        housing = fetch_openml(name="house_prices", as_frame=True)

    for the Ames housing dataset.

  warnings.warn(msg, category=FutureWarning)
[flaml.automl: 10-05 21:58:52] {1457} INFO - Data split method: uniform
[flaml.automl: 10-05 21:58:52] {1461} INFO - Evaluation method: cv
[flaml.automl: 10-05 21:58:52] {1509} INFO - Minimizing error metric: 1-r2
[flaml.automl: 10-05 21:58:52] {1546} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'catboost', 'xgboost', 'extra_tree']
[flaml.automl: 10-05 21:58:52] {1776} INFO - iteration 0, current learner lgbm
[flaml.tune.tune: 10-05 21:58:52] {392} INFO - trial 1 config: {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20, 'learning_rate': 0.09999999999999995, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 0.0009765625, 'reg_lambda': 1.0}
[flaml.automl: 10-05 21:58:52] {98} DEBUG - flaml.model - LGBMRegressor(learning_rate=0.09999999999999995, max_bin=255, n_estimators=1,
              num_leaves=4, reg_alpha=0.0009765625, reg_lambda=1.0, verbose=-1) fit started
zsh: segmentation fault  ipython

cdeil commented 3 years ago

Maybe you'll see the error if you add a MacOS conda build using the env I gave above to your CI?

cdeil commented 3 years ago

Could it be related to the ethical problems with the Boston dataset? Or it could simply be that after fitting that over and over again in the past years, my CPU got super bored and finally couldn't take it any more and said enough is enough. segfault. Those are the hardest bugs to reproduce. Good luck!

sonichi commented 3 years ago

The error happened in fitting the model LGBMRegressor(learning_rate=0.09999999999999995, max_bin=255, n_estimators=1, num_leaves=4, reg_alpha=0.0009765625, reg_lambda=1.0, verbose=-1) Does this model's fit() work in your env without running flaml?

cdeil commented 3 years ago

Yes, the issue is just in lightgbm, independent of flaml:

import lightgbm as lgb
from sklearn.datasets import fetch_california_housing
X_train, y_train = fetch_california_housing(return_X_y=True)
model = lgb.LGBMRegressor()
model.fit(X_train, y_train)

gives:

% python crash2.py
zsh: segmentation fault  python crash2.py

@sonichi - maybe you could notify a lightgbm dev to this issue to have a look and see if it's reproducible? Or do you want me to close this issue here and re-file it over in their issue tracker?

sonichi commented 3 years ago

@cdeil Thanks for confirming that. I tried to create an issue in https://github.com/microsoft/LightGBM/issues/new?assignees=&labels=&template=BUG_REPORT.md but it requires details that I don't know. Could you please create an issue there?

cdeil commented 3 years ago

Reported here: https://github.com/microsoft/LightGBM/issues/4666

microsoft / FLAML

Hang or segfault with Python 3.9 MacOS conda #243