mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3.05k stars 406 forks source link

Predictions of Boolean labels automatically converted to integer #442

Open PGijsbers opened 3 years ago

PGijsbers commented 3 years ago

I encountered an issue while using mljar-supervised with a boolean series as target. The produced predictions are not in [False, True] but instead [0, 1].

import numpy as np
import pandas as pd
x, y = pd.DataFrame(np.random.random(size=(150, 4))), pd.Series([True] * 75 + [False] * 75)

import supervised 
from supervised.automl import AutoML
automl = AutoML(ml_taks="binary_classification", total_time_limit=60)
automl.fit(x, y)
automl.predict(x)
>>> array([1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
       1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0,
       1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0])

As you can see, the predictions produced by mljar-supervised are not in the original domain. This also happens with predict_all's label column, the probability columns are called prediction_0 and prediction_1 accordingly.

edit: tested with 0.10.3 and 0.10.6

pplonski commented 3 years ago

@PGijsbers thank you for reporting!