microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.67k stars 3.83k forks source link

Interaction Constraints erroring when running distributed #5680

Open Chrisjb opened 1 year ago

Chrisjb commented 1 year ago

Description

The interaction constraints do not seem to be working correctly when running lightgbm in a distributed context.

An issue was originally raised on the lightgbm-ray github where it was suggested this was likely an issue with LightGBM itself as the problem could also be reproduced in LightGBM Dask.

The constraints work as expected when running LightGBMw with no distributed data.

Reproducible example

Dask:

from distributed import Client, LocalCluster
import dask.dataframe as dd
import pandas as pd
from sklearn.datasets import load_boston
import lightgbm as lgb

if __name__ == "__main__":
    print("loading data")
    boston = load_boston()
    X, y = boston.data, boston.target

    params = {
        'boosting_type': 'goss',
        'objective': 'regression',
        'metric': 'rmse',
        'num_leaves': 10,
        'max_depth': 4,
        'learning_rate': 0.05,
        'verbose': 10
    }

    print("initializing a Dask cluster")

    cluster = LocalCluster()
    client = Client(cluster)

    print("created a Dask LocalCluster")

    print("distributing training data on the Dask cluster")
    df = pd.DataFrame(X, columns=boston.feature_names)
    dX = dd.from_pandas(df, npartitions=4)
    dy = dd.from_pandas(pd.Series(y, name="target"), npartitions=4)

    constrained_feature = 'AGE'
    other_features = [
        x for x, y in enumerate(df.columns) if y != constrained_feature
    ]
    constrained_feature_idx = [
        x for x, y in enumerate(df.columns) if y == constrained_feature
    ]

    constraint = [constrained_feature_idx, other_features]

    print("beginning training")

    dask_model = lgb.DaskLGBMRegressor(
        interaction_constraints=constraint, **params)
    dask_model.fit(dX, dy)
    assert dask_model.fitted_

    print("done training")

Also present in Ray https://github.com/ray-project/lightgbm_ray/issues/41

Environment info

LightGBM version or commit hash:

Command(s) you used to install LightGBM

pip install lightgbm==3.3.2
pip install lightgbm_ray

Additional info

On Ray, I receive the error:

_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) SIGSEGV received at time=1673976819 on cpu 3 (_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) PC: @ 0x7fedc74926f7 (unknown) (unknown) (_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) @ 0x7fedc750d420 (unknown) (unknown) (_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) [2023-01-17 09:33:39,994 E 266 292] logging.cc:361: SIGSEGV received at time=1673976819 on cpu 3 (_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) [2023-01-17 09:33:39,994 E 266 292] logging.cc:361: PC: @ 0x7fedc74926f7 (unknown) (unknown) (_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) [2023-01-17 09:33:39,994 E 266 292] logging.cc:361: @ 0x7fedc750d420 (unknown) (unknown) (_RemoteRayLightGBMActor pid=266, ip=10.99.13.194) Fatal Python error: Segmentation fault

jameslamb commented 1 year ago

Thanks for the report! What operating system are you on? That was omitted here and in the original Ray issue.

That's generally useful, but also specifically relevant because of issues like #4229 .

Chrisjb commented 1 year ago

Thanks for the report! What operating system are you on? That was omitted here and in the original Ray issue.

That's generally useful, but also specifically relevant because of issues like #4229 .

Thanks @jameslamb - the OS is Ubuntu 18.04

J-GRIFF1 commented 1 year ago

This is a problem that we have seen too - is there any indication of whether a solution is coming soon?