mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 400 forks source link

Incorrect AUC value in CatBoost chart with sample_weight #538

Open offchan42 opened 2 years ago

offchan42 commented 2 years ago

I trained a dataset with sample weight using 3 algos: LightGBM, Xgboost, and CatBoost. I found that the learning curve chart for CatBoost doesn't take into account the sample weight but the score in the table does. Maybe you forgot to put sample_weight for CatBoost charts? I also see the problems in the ROC curve chart (but it's the same behavior among all models). Also, could this affect the training result e.g. terminating at the wrong place? Because I saw the model trained for many iterations.

image image

pplonski commented 2 years ago

@off99555 thank you for reporting. Is it the problem only for CatBoost?

offchan42 commented 2 years ago

@pplonski Yes, I think so. At least LightGBM and Xgboost don't seem to have this problem (wrong metric in the learning curve chart).

alencn1024 commented 2 years ago

kindly to ask is there somebody working on this issue? If not, I'm glad to undertake it @pplonski

pplonski commented 2 years ago

@alencn1024 thanks for looking into it!