mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3.02k stars 403 forks source link

'TfidfVectorizer' object has no attribute 'get_feature_names_out' error #486

Open FahriBilici opened 2 years ago

FahriBilici commented 2 years ago

Hello, I just started using mljar. I am trying to build a model but it is giving me the 'TfidfVectorizer' object has no attribute 'get_feature_names_out' on error.md. I am using scikit-learn 1.0.0 version. How can I fix this?

pplonski commented 2 years ago

@FahriBilici thank you for reporting the issue. Could you please provide code and data to reproduce the problem?

From your description I can say that it looks like a problem with column that has text values. Maybe try to exclude that column from model analysis, if possible. If it is not possible, please provide the steps to reproduce the problem and I will fix it.

hunaidkhan2000 commented 2 years ago

I am also facing the same error.I have tried to update Sklearn to 1.0.2. to Reproduce the problem kindly use this dataset and use the Kabitakitchen.csv. It is a multiclass text classification problem.

pplonski commented 2 years ago

@MaciekEO could you please look into this on Monday?

pplonski commented 2 years ago

I've updated scikit-learn in requirements.txt with minimum value to 1.0.0.

@hunaidkhan2000 with scikit-learn set to 1.0.2 I cant reproduce the issue. (for @MaciekEO it is also working without error)

The code that I've used:

import pandas as pd
import numpy as np
from supervised.automl import AutoML

df = pd.read_csv("~/Downloads/Cooking Data/kabitakitchen.csv",encoding= 'unicode_escape')
print(df)

X = df[["commentText"]]
y = df["Labels"]

automl = AutoML()
automl.fit(X, y)

My packages:

Package             Version
------------------- -------
alembic             1.7.6
attrs               21.4.0
autopage            0.5.0
catboost            1.0.4
category-encoders   2.3.0
cliff               3.10.1
cmaes               0.8.2
cmd2                2.4.0
colorlog            6.6.0
colour              0.1.5
cycler              0.11.0
dtreeviz            1.3.3
fonttools           4.29.1
graphviz            0.19.1
greenlet            1.1.2
importlib-metadata  4.11.2
importlib-resources 5.4.0
iniconfig           1.1.1
joblib              1.1.0
kiwisolver          1.3.2
lightgbm            3.3.2
llvmlite            0.38.0
Mako                1.1.6
Markdown            3.3.6
MarkupSafe          2.1.0
matplotlib          3.5.1
numba               0.55.1
numpy               1.21.5
optuna              2.10.0
packaging           21.3
pandas              1.4.1
patsy               0.5.2
pbr                 5.8.1
Pillow              9.0.1
pip                 22.0.3
pkg_resources       0.0.0
plotly              5.6.0
pluggy              1.0.0
prettytable         3.1.1
py                  1.11.0
pyparsing           3.0.7
pyperclip           1.8.2
pytest              7.0.1
python-dateutil     2.8.2
pytz                2021.3
PyYAML              6.0
scikit-learn        1.0.2
scikit-plot         0.3.7
scipy               1.8.0
seaborn             0.11.2
setuptools          60.9.3
shap                0.36.0
six                 1.16.0
slicer              0.0.7
SQLAlchemy          1.4.31
statsmodels         0.13.2
stevedore           3.5.0
tabulate            0.8.9
tenacity            8.0.1
threadpoolctl       3.1.0
tomli               2.0.1
tqdm                4.63.0
typing_extensions   4.1.1
wcwidth             0.2.5
wheel               0.37.1
wordcloud           1.8.1
xgboost             1.5.2
zipp                3.7.0
hunaidkhan2000 commented 2 years ago

Sure Let me check and update you

On Tue, 1 Mar 2022 at 17:07, Piotr @.***> wrote:

I've updated scikit-learn in requirements.txt with minimum value to 1.0.0.

@hunaidkhan2000 https://github.com/hunaidkhan2000 with scikit-learn set to 1.0.2 I cant reproduce the issue. (for @MaciekEO https://github.com/MaciekEO it is also working without error)

The code that I've used:

import pandas as pdimport numpy as npfrom supervised.automl import AutoML df = pd.read_csv("~/Downloads/Cooking Data/kabitakitchen.csv",encoding= 'unicode_escape')print(df) X = df[["commentText"]]y = df["Labels"] automl = AutoML()automl.fit(X, y)

My packages:

Package Version


alembic 1.7.6 attrs 21.4.0 autopage 0.5.0 catboost 1.0.4 category-encoders 2.3.0 cliff 3.10.1 cmaes 0.8.2 cmd2 2.4.0 colorlog 6.6.0 colour 0.1.5 cycler 0.11.0 dtreeviz 1.3.3 fonttools 4.29.1 graphviz 0.19.1 greenlet 1.1.2 importlib-metadata 4.11.2 importlib-resources 5.4.0 iniconfig 1.1.1 joblib 1.1.0 kiwisolver 1.3.2 lightgbm 3.3.2 llvmlite 0.38.0 Mako 1.1.6 Markdown 3.3.6 MarkupSafe 2.1.0 matplotlib 3.5.1 numba 0.55.1 numpy 1.21.5 optuna 2.10.0 packaging 21.3 pandas 1.4.1 patsy 0.5.2 pbr 5.8.1 Pillow 9.0.1 pip 22.0.3 pkg_resources 0.0.0 plotly 5.6.0 pluggy 1.0.0 prettytable 3.1.1 py 1.11.0 pyparsing 3.0.7 pyperclip 1.8.2 pytest 7.0.1 python-dateutil 2.8.2 pytz 2021.3 PyYAML 6.0 scikit-learn 1.0.2 scikit-plot 0.3.7 scipy 1.8.0 seaborn 0.11.2 setuptools 60.9.3 shap 0.36.0 six 1.16.0 slicer 0.0.7 SQLAlchemy 1.4.31 statsmodels 0.13.2 stevedore 3.5.0 tabulate 0.8.9 tenacity 8.0.1 threadpoolctl 3.1.0 tomli 2.0.1 tqdm 4.63.0 typing_extensions 4.1.1 wcwidth 0.2.5 wheel 0.37.1 wordcloud 1.8.1 xgboost 1.5.2 zipp 3.7.0

— Reply to this email directly, view it on GitHub https://github.com/mljar/mljar-supervised/issues/486#issuecomment-1055336823, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIJ6DG4FZPQMWAFZO5FRZ7DU5X6PVANCNFSM5IAJ75AQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

-- Thanks and Regards, Hunaidkhan Pathan