microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.07k stars 831 forks source link

[LightGBM] Weight column in LightGBM classifier is not working as per expectation #1965

Open coolcoder001 opened 1 year ago

coolcoder001 commented 1 year ago

SynapseML version

2.12:0.9.5

System information

Describe the problem

Hi , I am using LightGBMClassifier for a skewed binary classification problem. I have several features like A, B, C.... so on. I am grouping by the features and computing weights for class 0 and class 1.

However, for testing data I am giving weights as all 1s.

I can see my testing data's loss is not converging. Is this the correct way to use weightCol feature ?

One more observation, while inferencing if I use isUnbalance as True , then the model gives random predictions , AUC comes down to 50%. So, I had to use isUnbalance as False while inferencing. Please let me know if this is the correct behavior.

Code to reproduce issue

params = {'baggingFraction': 0.8156468375795559,
                   'featureFraction': 0.8609557255311693,
                   'featuresCol': 'features',
                   'labelCol': 'label',
                   'learningRate': 0.1449558170049662,
                   'maxDepth': 29,
                   'minSumHessianInLeaf': 0.03753901648224433,
                   'numIterations': 80,
                   'numLeaves': 133,
                   'weightCol': 'weight',
                   'objective': 'binary',
                   'useSingleDatasetMode': True,
                   'isUnbalance': False,
                   'useBarrierExecutionMode': True,
                   'parallelism': 'voting_parallel',
                   'metric': 'auc'
                   }

lgb = LightGBMClassifier(
                             numIterations = params['numIterations'],
                             numLeaves = params['numLeaves'],
                             maxDepth = params['maxDepth'],
                             baggingFraction = params['baggingFraction'],
                             featureFraction = params['featureFraction'],
                             minSumHessianInLeaf = params['minSumHessianInLeaf'],
                             learningRate=params['learningRate'],
                             objective = params['objective'],
                             labelCol = params['labelCol'],
                             featuresCol=params['featuresCol'],
                             weightCol=params['weightCol'],
                             useSingleDatasetMode=True,
                             #isUnbalance=False,
                             useBarrierExecutionMode=True,
                             #parallelism = "voting_parallel",
                             metric = params['metric']
                            )

Other info / logs

No response

What component(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

github-actions[bot] commented 1 year ago

Hey @coolcoder001 :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.

svotaw commented 1 year ago

We have released 11.2, which has newer features. We aren't really supporting 0.9.5 anymore, and will release the official 1.0 version soon.