mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 401 forks source link

provide labels for true classes #34

Closed Shoeboxam closed 4 years ago

Shoeboxam commented 4 years ago

When working with imbalanced datasets, a class may be underrepresented to the point where y_true and y_pred nearly always contain a different number of classes (for example, one class is missing from the predicted values). Because of this, mljar oftentimes cannot be used for imbalanced datasets.

I have attached the error below:

MLJAR AutoML:   0%|          | 0/80 [00:00<?, ?model/s]Traceback (most recent call last):
...
  File "/home/shoe/.virtualenvs/2ravens/lib/python3.6/site-packages/supervised/automl.py", line 256, in fit
    self.not_so_random_step(X, y)
  File "/home/shoe/.virtualenvs/2ravens/lib/python3.6/site-packages/supervised/automl.py", line 207, in not_so_random_step
    m = self.train_model(params, X, y)
  File "/home/shoe/.virtualenvs/2ravens/lib/python3.6/site-packages/supervised/automl.py", line 164, in train_model
    il.train({"train": {"X": X, "y": y}})
  File "/home/shoe/.virtualenvs/2ravens/lib/python3.6/site-packages/supervised/iterative_learner_framework.py", line 75, in train
    self.predictions(learner, train_data, validation_data),
  File "/home/shoe/.virtualenvs/2ravens/lib/python3.6/site-packages/supervised/callbacks/callback_list.py", line 23, in on_iteration_end
    cb.on_iteration_end(logs, predictions)
  File "/home/shoe/.virtualenvs/2ravens/lib/python3.6/site-packages/supervised/callbacks/early_stopping.py", line 59, in on_iteration_end
    predictions.get("y_train_true"), predictions.get("y_train_predicted")
  File "/home/shoe/.virtualenvs/2ravens/lib/python3.6/site-packages/supervised/metric.py", line 58, in __call__
    return self.metric(y_true, y_predicted)
  File "/home/shoe/.virtualenvs/2ravens/lib/python3.6/site-packages/supervised/metric.py", line 24, in logloss
    ll = log_loss(y_true, y_predicted)
  File "/home/shoe/.virtualenvs/2ravens/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 1809, in log_loss
    lb.classes_))
ValueError: y_true and y_pred contain different number of classes 3, 2. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1 2]
Shoeboxam commented 4 years ago

After working on a patch, this is actually an issue with multi-class classification, which is not supported. The code is already doing clamping to convert the data to a two-class problem, and my third class is not supported. Closing as a duplicate of #18.