microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.67k stars 3.83k forks source link

Monotone constraint with Quantile distribution does not work correctly #3371

Closed maurever closed 4 years ago

maurever commented 4 years ago

I tried quantile distribution while the monotone constraint is set and it looks it is not implemented correctly. The result prediction is not monotone, see the example and image bellow.

How you are using LightGBM?

Environment info

Operating System: Linux Debian 10.5 (x86-64)

Python version: Python 3.7.3rc1

LightGBM version or commit hash: lightgbm==2.3.1

Reproducible example

# prepare data
import numpy as np
np.random.seed(1)

def f(x):
    """The function to predict."""
    return x * np.sin(x)

X = np.atleast_2d(np.random.uniform(0, 10.0, size=100)).T
X = X.astype(np.float32)

y = f(X).ravel()

dy = 1.5 + 1.0 * np.random.random(y.shape)
noise = np.random.normal(0, dy)
y += noise
y = y.astype(np.float32)

xx = np.atleast_2d(np.linspace(0, 10, 100)).T
xx = xx.astype(np.float32)

# prepare plot function
import matplotlib.pyplot as plt

def plot_prediction_quantile(xx, fxx, xx_label, X, y, y_pred,ylim, title, y_upper=None, y_lower=None, 
                             confidence_label=None):
    fig = plt.figure()
    plt.plot(xx, fxx, 'g:', label=xx_label)
    if (X is not None) and (y is not None):
        plt.plot(X, y, 'b.', markersize=10, label=u'Observations')
    plt.plot(xx, y_pred, 'r-', label=u'Prediction')
    if (y_upper is not None) and (y_lower is not None):
        plt.plot(xx, y_upper, 'k-')
        plt.plot(xx, y_lower, 'k-')
        plt.fill(np.concatenate([xx, xx[::-1]]),
                 np.concatenate([y_upper, y_lower[::-1]]),
                 alpha=.5, fc='b', ec='None', label=confidence_label)
    plt.xlabel('$x$')
    plt.ylabel('$f(x)$')
    plt.ylim(ylim)
    plt.legend(loc='upper left')
    plt.title(title)
    plt.show()

# prepare lightgbm
from lightgbm import LGBMRegressor

lgb_params = {
    'n_jobs': 1,
    'max_depth': 5,
    'min_data_in_leaf': 3,
    'n_estimators': 100,
    'learning_rate': 0.1,
    'colsample_bytree': 0.9,
    'boosting_type': 'gbdt',
    'monotone_constraints': -1
}

lgb_no_monotonicity = LGBMRegressor(objective='quantile', alpha=0.4, **lgb_params)
lgb_no_monotonicity .fit(X, y)
y_no_monotonicity_lgb = lgb_no_monotonicity.predict(xx)

plot_prediction_quantile(xx, f(xx), r'f(x)$', X, y, y_no_monotonicity_lgb, [min(y)-0.2*min(y), max(y)+0.2*max(y)], "LightGBM Quantile (quantile_alpha=0.4) with monotone constraint 1 - Monotonicity constraint violated!")

Result plot

lightgbm

Steps to reproduce

  1. run from console using

python example.py

  1. see the resulting image
guolinke commented 4 years ago

Yes, For the objective function with RenewTreeOutput, like quantile, mae, etc, the mononote constraint will be broken. There are not good solutions so far, as this line search process is optimized for different leaves independently.

guolinke commented 4 years ago

I don't think this is a bug. will create a PR to warn the user when using them together. Also, ping @CharlesAuguste for future possible solutions.

CharlesAuguste commented 4 years ago

I am not familiar with how the quantile objective function works, but I will take a look!

CharlesAuguste commented 4 years ago

I gave this some thoughts, and one way I can see this working would be:

I don't have any theoretical guarantee that this would work, but it seems like a reasonable procedure to me. Any thoughts @guolinke ?

guolinke commented 4 years ago

@CharlesAuguste thank you so much! Can't we just update the leaf outputs after RenewTreeOutput?

theoretically, the post-fix solution cannot learn the "optimal" tree structure, as we don't consider MC during tree growth. But the RenewTreeOut also is a post-fix solution, for the tree structure is learned by the different objective.

CharlesAuguste commented 4 years ago

@CharlesAuguste thank you so much! Can't we just update the leaf outputs after RenewTreeOutput?

theoretically, the post-fix solution cannot learn the "optimal" tree structure, as we don't consider MC during tree growth. But the RenewTreeOut also is a post-fix solution, for the tree structure is learned by the different objective.

@guolinke yes updating leaf outputs after RenewTreeOutput should work the same as far as I understand it. I can give that a try in the coming days, and we'll see how that works!

cah-autoit commented 3 years ago

@guolinke yes updating leaf outputs after RenewTreeOutput should work the same as far as I understand it. I can give that a try in the coming days, and we'll see how that works!

@CharlesAuguste was this ever fixed?

CharlesAuguste commented 3 years ago

Unfortunately I haven't fixed it, and I am not able to spend time on it right now. I am sorry about that.

alisoltanisobh commented 2 years ago

was this ever fixed?

jameslamb commented 1 year ago

was this ever fixed?

Thanks @alisoltanisobh ! Looking at #3380, the PR that resulted in this issue being automatically closed, I don't think so.

I've renamed this to "support monotone constraints with quantile distribution" and added it to #2302, where we track other feature requests for this project.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.