microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.74k stars 3.84k forks source link

Custom metrics by group #4995

Open skwskwskwskw opened 2 years ago

skwskwskwskw commented 2 years ago

Is there a way to create custom metrics that rely on another column (i.e.: ID)?

For instance, average Pearson Correlation (by ID). In this case you will need to calculate Pearson Correlation of target & predicted value; and do the average by another column (i.e.: ID). Just wondering if there's any case or example like this.

Thanks.

jmoralez commented 2 years ago

Hi @similang, thank you for your interest in LightGBM. Currently there's not a supported way to do that, however you could achieve that by adding an attribute to the dataset and using that in your custom metric. Here's an example in Python:

import lightgbm as lgb
import numpy as np
import pandas as pd

def avg_mse_by_id(y_pred, ds):
    y_true = ds.get_label()
    sq_errs = (y_true - y_pred)**2
    avg_per_id = pd.Series(sq_errs).groupby(ds.id).mean()
    print(avg_per_id)
    return 'avg_mse_by_id', avg_per_id.mean(), False

rng = np.random.RandomState(0)
id = rng.choice([1, 2], 100)
X = np.linspace(-1, 1, 100).reshape(-1, 1)
y = X[:, 0] ** 2 + id * rng.rand(100)  # id 2 has more noise
ds = lgb.Dataset(X, y)
ds.id = id  # this is the extra attribute
bst = lgb.train(
    {'num_leaves': 3, 'verbose': -1, 'metric': 'None'},
    ds,
    num_boost_round=1,
    valid_sets=[ds],
    feval=avg_mse_by_id,
    callbacks=[lgb.log_evaluation(1)],
)

You should see something like the following:

1    0.185718
2    0.470320
dtype: float64
[1] training's avg_mse_by_id: 0.328019

Where the first section is the series containing the mses by id (this is just for debugging) and the last part is the average of the mses by id.

Keep in mind that this is a bit of a hack to make it work in the meantime and if this is what you're looking for we could take it as a feature request.

skwskwskwskw commented 2 years ago

Thanks for the quick response. Yes let me give it a try, definitely it will be great if this is part of another feature to add =)

ComeTr-2097 commented 5 months ago

ds.id = id # this is the extra attribute I wonder whether 'id' is one of the features (like 'X') that construct the LightGBM model?

jmoralez commented 5 months ago

id is just an attribute on a python object which LightGBM doesn't even know exists. I think the example in https://github.com/microsoft/LightGBM/issues/5917#issuecomment-1583807569 with a class is better since it doesn't rely on this kind of thing, please use that instead if possible.