Been using vaex for some ML work, it's been incredibly useful, so thanks for that! I was wondering whether it would be possible to introduce weights in the LightGBM wrapper?
By the looks of it, I think the most straightforward way would be to add it to the dtrain dataset - LightGBM datasets have a 'weight' parameter that is considered during training, but this parameter isn't currently used in vaex (https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Dataset.html#lightgbm.Dataset). The vaex.ml.lightgbm.LightGBMModel.fit method could be changed to something like the below snippet, with a weight_col parameter in the method, which is passed to the lightgbm.Dataset:
def fit(self, df, valid_sets=None, valid_names=None, early_stopping_rounds=None, evals_result=None, verbose_eval=None, weight_col=None, **kwargs):
if weight_col is not None:
dtrain = lightgbm.Dataset(df[self.features].values, df[self.target].to_numpy(), weight=df[weight_col].to_numpy())
else:
dtrain = lightgbm.Dataset(df[self.features].values, df[self.target].to_numpy())
...
Or, I guess, the weight column could be defined at the same time as self.features and self.target, not sure what might be best in your opinion! Hope this makes sense - if it's already been discussed/brought up, I apologise, couldn't see an issue for it, but I might just be blind. Thanks very much 😄
Hi there,
Been using vaex for some ML work, it's been incredibly useful, so thanks for that! I was wondering whether it would be possible to introduce weights in the LightGBM wrapper?
By the looks of it, I think the most straightforward way would be to add it to the
dtrain
dataset - LightGBM datasets have a 'weight' parameter that is considered during training, but this parameter isn't currently used in vaex (https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Dataset.html#lightgbm.Dataset). The vaex.ml.lightgbm.LightGBMModel.fit method could be changed to something like the below snippet, with aweight_col
parameter in the method, which is passed to the lightgbm.Dataset:Or, I guess, the weight column could be defined at the same time as
self.features
andself.target
, not sure what might be best in your opinion! Hope this makes sense - if it's already been discussed/brought up, I apologise, couldn't see an issue for it, but I might just be blind. Thanks very much 😄