Open yiannis-gkoufas opened 2 months ago
Hi @yiannis-gkoufas,
I understand that you were able to train ML models with AutoML but there is problem with predictions only. Could you please provide code that you are using for computing predictions?
Hi @pplonski!
I use the same constructor for AutoML and pass a dataframe.
automl = AutoML(results_path=str(model_directory),
mode="Compete",
total_time_limit=600 * 600,
golden_features=True,
features_selection=True,
ml_task="binary_classification")
Could it be an issue with the ensemble model?
Thank you @yiannis-gkoufas for response. It looks like some bug with computing predictions for Stacked Ensemble. Is it possible to share full code and data to reproduce the issue?
This code:
from sklearn.model_selection import train_test_split
from supervised import AutoML
import pandas as pd
if __name__ == '__main__':
df = pd.read_csv(
"https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv",
skipinitialspace=True,
)
X_train, X_test, y_train, y_test = train_test_split(
df[df.columns[:-1]], df["income"], test_size=0.25
)
automl = AutoML(results_path="./model",
mode="Compete",
total_time_limit=600 * 600,
golden_features=True,
features_selection=True,
ml_task="binary_classification")
automl.fit(X_train, y_train)
predictions = automl.predict(X_test)
print(predictions)
reproduced the issue for me, because the ensemble stacked is identified as the best model. It takes a while to run ofcourse. The message I got:
Traceback (most recent call last):
File "/Users/prezi/Code/mljar_issue/mljar_issue/main.py", line 23, in <module>
predictions = automl.predict(X_test)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/automl.py", line 451, in predict
return self._predict(X)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/base_automl.py", line 1503, in _predict
predictions = self._base_predict(X)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/base_automl.py", line 1465, in _base_predict
predictions = model.predict(X, X_stacked)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/ensemble.py", line 434, in predict
y_predicted_from_model = model.predict(X_stacked)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/model_framework.py", line 448, in predict
y_p = learner.predict(X_data)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/supervised/algorithms/sklearn.py", line 66, in predict
return self.model.predict_proba(X)[:, 1]
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/sklearn/ensemble/_forest.py", line 947, in predict_proba
X = self._validate_X_predict(X)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/sklearn/ensemble/_forest.py", line 641, in _validate_X_predict
X = self._validate_data(
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/sklearn/base.py", line 608, in _validate_data
self._check_feature_names(X, reset=reset)
File "/Users/prezi/Library/Caches/pypoetry/virtualenvs/mljar-issue-kQcsGfQC-py3.10/lib/python3.10/site-packages/sklearn/base.py", line 535, in _check_feature_names
raise ValueError(message)
ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time:
- 100_NearestNeighbors_prediction
- 101_NearestNeighbors_prediction
- 102_Xgboost_BoostOnErrors_prediction
- 102_Xgboost_prediction
- 103_Xgboost_prediction
- ...
Feature names seen at fit time, yet now missing:
- 100_NearestNeighbors_prediction_0_for_<=50K_1_for_>50K
- 101_NearestNeighbors_prediction_0_for_<=50K_1_for_>50K
- 102_Xgboost_BoostOnErrors_prediction_0_for_<=50K_1_for_>50K
- 102_Xgboost_prediction_0_for_<=50K_1_for_>50K
- 103_Xgboost_prediction_0_for_<=50K_1_for_>50K
- ...
I have the same issue.
Hi!
I want to use mljar for binary classification (category1+category2). The parameters I am passing to AutoML are the following:
In the params.json I see
"best_model": "Ensemble_Stacked"
When I try to run a prediction I get:Any help would be appreciated!