Open Haizhuolaojisite opened 1 year ago
Hey @Haizhuolaojisite :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.
All tunable parameters for lightgbm are here: https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.rst#core-parameters @svotaw , @imatiach-msft could you provide more comments here? :)
As a wrapper around LightGBM, SynapseML supports all parameters of LightGBM (at least those that make sense in distributed Spark mode). If we don't support it explicitly, you can use passThroughArgs to add them yourself. For advice on LightGBM-specific functionality, I'd suggest you try the LightGBM team directly at microsoft/lightgbm. They can give more advice on how they handle things like nulls and unbalanced datasets.
Is your feature request related to a problem? Please describe. I aims to run the lightgbm model for a multiclass classification problem. But I didn't find a feature parameter to balanced the dataset (oversampling, downsampling, or class weights). There is one boolean parameter called isUnbalance, but it's only for binary classification scenario.
isUnbalance ([bool](https://docs.python.org/3/library/functions.html#bool)) – Set to true if training data is unbalanced in binary classification scenario
Describe the solution you'd like I'd like a parameter class weights to balance data for each class, or a boolean flag isUnbalance for multiclass classification to automatically handle the imbalance dataset.
Additional context The lightGBM model accepts null value in the dataset, even though I don't understand how it deals with null, but will null value affects the dataset balance processing? Is there a parameter for null value processing? It would be awesome if there's some official examples for multiclass classification using lightGBM model on imbalanced dataset, which has both categorical features and numerical features with missing values.
Thank you very much!!