pycaret / pycaret

An open-source, low-code machine learning library in Python
https://www.pycaret.org
MIT License
8.96k stars 1.77k forks source link

[BUG]: test_data in classification set up does not work #4083

Open diara3 opened 2 weeks ago

diara3 commented 2 weeks ago

pycaret version checks

Issue Description

If i try to include the test_data = ... in my clf.setup() I get an error. (see below)

Reproducible Example

python
    clf = ClassificationExperiment()
    clf.setup(data=data_df, target='label', test_data=test_df)

Expected Behavior

The setup function should successfully preprocess the test data, encoding the features given the test_df follows exactly same structure as data_df

Actual Results

File "./venv/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3550, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-9-2bc6b3f4eb23>", line 1, in <module>
    clf_exp.setup(data=data_df
  File "./venv/lib/python3.10/site-packages/pycaret/classification/oop.py", line 890, in setup
    self.pipeline.fit(self.X_train, self.y_train)
  File "./venv/lib/python3.10/site-packages/pycaret/internal/pipeline.py", line 273, in fit
    X, y, _ = self._fit(X, y, routed_params)
  File "./venv/lib/python3.10/site-packages/pycaret/internal/pipeline.py", line 249, in _fit
    fitted_transformer = self._memory_fit(
  File "./venv/lib/python3.10/site-packages/joblib/memory.py", line 655, in __call__
    return self._cached_call(args, kwargs)[0]
  File "./venv/lib/python3.10/site-packages/pycaret/internal/memory.py", line 392, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "./venv/lib/python3.10/site-packages/pycaret/internal/memory.py", line 308, in call
    output = self.func(*args, **kwargs)
  File "./venv/lib/python3.10/site-packages/pycaret/internal/pipeline.py", line 69, in _fit_one
    transformer.fit(*args)
  File "./venv/lib/python3.10/site-packages/pycaret/internal/preprocess/transformers.py", line 229, in fit
    self.transformer.fit(*args, **fit_params)
  File "./venv/lib/python3.10/site-packages/sklearn/impute/_iterative.py", line 880, in fit
    self.fit_transform(X)
  File "./venv/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 295, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "./venv/lib/python3.10/site-packages/pycaret/internal/preprocess/iterative_imputer.py", line 339, in fit_transform
    X, Xt, mask_missing_values, complete_mask = self._initial_imputation(
  File "./venv/lib/python3.10/site-packages/pycaret/internal/preprocess/iterative_imputer.py", line 189, in _initial_imputation
    X = self._validate_data(
  File "./venv/lib/python3.10/site-packages/sklearn/base.py", line 633, in _validate_data
    out = check_array(X, input_name="X", **check_params)
  File "./venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 997, in check_array
    array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)
  File "./venv/lib/python3.10/site-packages/sklearn/utils/_array_api.py", line 521, in _asarray_with_order
    array = numpy.asarray(array, order=order, dtype=dtype)
  File "./venv/lib/python3.10/site-packages/pandas/core/generic.py", line 2084, in __array__
    arr = np.asarray(values, dtype=dtype)
ValueError: could not convert string to float: 'Agree'

Installed Versions

pip: 24.2 setuptools: 70.0.0 pycaret: 3.4.0 IPython: 8.18.0 ipywidgets: 8.1.3 tqdm: 4.66.4 numpy: 1.26.4 pandas: 2.2.2 jinja2: 3.1.4 scipy: 1.11.4 joblib: 1.3.2 sklearn: 1.4.2 pyod: 2.0.1 imblearn: 0.12.3 category_encoders: 2.6.3 lightgbm: 4.4.0 numba: 0.60.0 requests: 2.32.3 matplotlib: 3.7.5 scikitplot: 0.3.7 yellowbrick: 1.5 plotly: 5.22.0 plotly-resampler: Not installed kaleido: 0.2.1 schemdraw: 0.15 statsmodels: 0.14.2 sktime: 0.32.1 tbats: 1.1.3 pmdarima: 2.0.4 psutil: 6.0.0 markupsafe: 2.1.5 pickle5: Not installed cloudpickle: 3.0.0 deprecation: 2.1.0 xxhash: 3.4.1 wurlitzer: 3.1.1 PyCaret optional dependencies: shap: 0.46.0 interpret: Not installed umap: Not installed ydata_profiling: Not installed explainerdashboard: Not installed autoviz: Not installed fairlearn: Not installed deepchecks: Not installed xgboost: 2.1.0 catboost: Not installed kmodes: Not installed mlxtend: Not installed statsforecast: Not installed tune_sklearn: Not installed ray: Not installed hyperopt: Not installed optuna: 3.6.1 skopt: 0.10.2 mlflow: 2.15.1 gradio: Not installed fastapi: Not installed uvicorn: 0.30.5 m2cgen: Not installed evidently: Not installed fugue: Not installed streamlit: 1.37.1
diara3 commented 2 weeks ago

I saw that the issue only exists if ordinal features included in the df