Open skwskwskwskw opened 2 years ago
Hi @similang, thank you for your interest in LightGBM. Currently there's not a supported way to do that, however you could achieve that by adding an attribute to the dataset and using that in your custom metric. Here's an example in Python:
import lightgbm as lgb
import numpy as np
import pandas as pd
def avg_mse_by_id(y_pred, ds):
y_true = ds.get_label()
sq_errs = (y_true - y_pred)**2
avg_per_id = pd.Series(sq_errs).groupby(ds.id).mean()
print(avg_per_id)
return 'avg_mse_by_id', avg_per_id.mean(), False
rng = np.random.RandomState(0)
id = rng.choice([1, 2], 100)
X = np.linspace(-1, 1, 100).reshape(-1, 1)
y = X[:, 0] ** 2 + id * rng.rand(100) # id 2 has more noise
ds = lgb.Dataset(X, y)
ds.id = id # this is the extra attribute
bst = lgb.train(
{'num_leaves': 3, 'verbose': -1, 'metric': 'None'},
ds,
num_boost_round=1,
valid_sets=[ds],
feval=avg_mse_by_id,
callbacks=[lgb.log_evaluation(1)],
)
You should see something like the following:
1 0.185718
2 0.470320
dtype: float64
[1] training's avg_mse_by_id: 0.328019
Where the first section is the series containing the mses by id (this is just for debugging) and the last part is the average of the mses by id.
Keep in mind that this is a bit of a hack to make it work in the meantime and if this is what you're looking for we could take it as a feature request.
Thanks for the quick response. Yes let me give it a try, definitely it will be great if this is part of another feature to add =)
ds.id = id # this is the extra attribute
I wonder whether 'id' is one of the features (like 'X') that construct the LightGBM model?
id
is just an attribute on a python object which LightGBM doesn't even know exists. I think the example in https://github.com/microsoft/LightGBM/issues/5917#issuecomment-1583807569 with a class is better since it doesn't rely on this kind of thing, please use that instead if possible.
Is there a way to create custom metrics that rely on another column (i.e.: ID)?
For instance, average Pearson Correlation (by ID). In this case you will need to calculate Pearson Correlation of target & predicted value; and do the average by another column (i.e.: ID). Just wondering if there's any case or example like this.
Thanks.