sb-ai-lab / LightAutoML

Fast and customizable framework for automatic ML model creation (AutoML)
https://developers.sber.ru/portal/products/lightautoml
Apache License 2.0
1.18k stars 51 forks source link

5-fold CV causes different number of classes in train/test #64

Open dev-rinchin opened 1 year ago

dev-rinchin commented 1 year ago

🐛 Bug

if train contains less than 5 instances of any class, one or more folds fails with "y_true and y_pred contain different number of classes" error.

To Reproduce

Run default AutoML on wine-quality-white task:

y_true and y_pred contain different number of classes 6, 7. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1 2 3 4 5] Traceback (most recent call last): File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/ml_algo/utils.py", line 66, in tune_and_fit_predict preds = ml_algo.fit_predict(train_valid) File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/ml_algo/base.py", line 273, in fit_predict model, pred = self.fit_predict_single_fold(train, valid) File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/ml_algo/linear_sklearn.py", line 140, in fit_predict_single_fold valid.weights, File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/ml_algo/torch_based/linear_model.py", line 406, in fit score = self.metric(y_val, val_pred, weights_val) File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/tasks/losses/base.py", line 42, in call val = self.metric_func(y_true, y_pred, sample_weight=sample_weight) File "/usr/local/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 2430, in log_loss transformed_labels.shape[1], ypred.shape[1], lb.classes ValueError: y_true and y_pred contain different number of classes 6, 7. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1 2 3 4 5] Traceback (most recent call last): File "experiments/run_tabular.py", line 75, in main(dataset_name=args.dataset, cpu_limit=args.cpu_limit, memory_limit=args.memory_limit) File "experiments/run_tabular.py", line 38, in main oof_predictions = automl.fit_predict(train, roles={"target": "class"}, verbose=10) File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/automl/presets/tabular_presets.py", line 549, in fit_predict oof_pred = super().fit_predict(train, roles=roles, cv_iter=cv_iter, valid_data=valid_data, verbose=verbose) File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/automl/presets/base.py", line 212, in fit_predict verbose=verbose, File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/automl/base.py", line 212, in fit_predict pipe_pred = ml_pipe.fit_predict(train_valid) File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/pipelines/ml/base.py", line 136, in fit_predict ), "Pipeline finished with 0 models for some reason.\nProbably one or more models failed" AssertionError: Pipeline finished with 0 models for some reason. Probably one or more models failed Process failed, exit code 1

github-actions[bot] commented 1 year ago

Stale issue message

kudep commented 1 year ago

Hopefully it will be fixed soon. We had same issue