error 'numpy.float64' object is not iterable

smuff11 commented 1 year ago

How can I solve this error? 화면 캡처 2023-06-21 144720

X_train, X_test, y_train, y_test = train_test_split(
    df[df.columns[1:-1]],
    df['Price'],
    test_size=0.25,
    random_state=123,
)

# train models with AutoML
automl = AutoML(mode="Explain")
automl.fit(X_train, y_train)

# compute the MSE on test data
predictions = automl.predict(X_test)
print("Test MAE:", mean_absolute_error(y_test, predictions))

These are output output: Linear algorithm was disabled. AutoML directory: AutoML_8 The task is regression with evaluation metric rmse AutoML will use algorithms: ['Baseline', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network'] AutoML will ensemble available models AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']

Step simple_algorithms will try to check up to 2 models 1Baseline rmse 35.216024 trained in 0.38 seconds Exception while producing SHAP explanations. module 'numpy' has no attribute 'bool'. np.bool was a deprecated alias for the builtin bool. To avoid this error in existing code, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool` here. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations Continuing ... 2_DecisionTree rmse 19.749834 trained in 1.46 seconds
Step default_algorithms will try to check up to 3 models Exception while producing SHAP explanations. module 'numpy' has no attribute 'int'. np.int was a deprecated alias for the builtin int. To avoid this error in existing code, use int by itself. Doing this will not modify any behavior and is safe. When replacing np.int, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information. The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations Continuing ... 3_Default_Xgboost rmse 9.812939 trained in 6.66 seconds There was an error during 3_Default_Xgboost training. Please check AutoML_8\errors.md for details. 4_Default_NeuralNetwork rmse 13.374119 trained in 4.76 seconds There was an error during 4_Default_NeuralNetwork training. ... Ensemble rmse 9.812939 trained in 0.14 seconds AutoML fit time: 19.64 seconds AutoML best model: 3_Default_Xgboost Test MAE: 6.264868275003538

And this is error.md

Error for 3_Default_Xgboost

'numpy.float64' object is not iterable Traceback (most recent call last): File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\base_automl.py", line 1095, in _fit trained = self.train_model(params) File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\base_automl.py", line 386, in train_model mf.save(results_path, model_subpath) File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in save preprocessing = [p.to_json() for p in self.preprocessings] File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in preprocessing = [p.to_json() for p in self.preprocessings] File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\preprocessing\preprocessing.py", line 582, in to_json preprocessing_params["scale_y"] = self._scale_y.to_json() File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\preprocessing\scale.py", line 76, in to_json data_json["X_min_values"] = list(self.X_min_values) TypeError: 'numpy.float64' object is not iterable

Please set a GitHub issue with above error message at: https://github.com/mljar/mljar-supervised/issues/new

Error for 4_Default_NeuralNetwork

'numpy.float64' object is not iterable Traceback (most recent call last): File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\base_automl.py", line 1095, in _fit trained = self.train_model(params) File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\base_automl.py", line 386, in train_model mf.save(results_path, model_subpath) File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in save preprocessing = [p.to_json() for p in self.preprocessings] File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in preprocessing = [p.to_json() for p in self.preprocessings] File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\preprocessing\preprocessing.py", line 582, in to_json preprocessing_params["scale_y"] = self._scale_y.to_json() File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\preprocessing\scale.py", line 76, in to_json data_json["X_min_values"] = list(self.X_min_values) TypeError: 'numpy.float64' object is not iterable

Please set a GitHub issue with above error message at: https://github.com/mljar/mljar-supervised/issues/new

Error for 5_Default_RandomForest

'numpy.float64' object is not iterable Traceback (most recent call last): File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\base_automl.py", line 1095, in _fit trained = self.train_model(params) File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\base_automl.py", line 386, in train_model mf.save(results_path, model_subpath) File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in save preprocessing = [p.to_json() for p in self.preprocessings] File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in preprocessing = [p.to_json() for p in self.preprocessings] File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\preprocessing\preprocessing.py", line 582, in to_json preprocessing_params["scale_y"] = self._scale_y.to_json() File "c:\Users\user\anaconda3\envs\mljar\lib\site-packages\supervised\preprocessing\scale.py", line 76, in to_json data_json["X_min_values"] = list(self.X_min_values) TypeError: 'numpy.float64' object is not iterable

Please set a GitHub issue with above error message at: https://github.com/mljar/mljar-supervised/issues/new

pplonski commented 1 year ago

Hi @smuff11,

Thank you for reporting the issue. Could you please attach dataset to reproduce issue?

smuff11 commented 1 year ago

train_eng.csv

Thanks for the fast reply!! Here is the dataset

ghost commented 1 year ago

Hi @pplonski ,

Today I encountered the same error.

Here is the code for reproducing the errors.

import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML

from sklearn.datasets import fetch_california_housing
data = fetch_california_housing()

X = pd.DataFrame(data["data"], columns=data["feature_names"])
y = pd.Series(data["target"], name="target")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

automl = AutoML(mode="Explain") 
automl.fit(X_train, y_train)

The followings are the errors

Linear algorithm was disabled.
AutoML directory: AutoML_4
The task is regression with evaluation metric rmse
AutoML will use algorithms: ['Baseline', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network']
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble']
* Step simple_algorithms will try to check up to 2 models
1_Baseline rmse 1.157438 trained in 0.35 seconds
2_DecisionTree rmse 0.799943 trained in 11.73 seconds
* Step default_algorithms will try to check up to 3 models
3_Default_Xgboost rmse 0.465512 trained in 5.11 seconds
There was an error during 3_Default_Xgboost training.
Please check AutoML_4\errors.md for details.
4_Default_NeuralNetwork rmse 0.536247 trained in 1.37 seconds
There was an error during 4_Default_NeuralNetwork training.
Please check AutoML_4\errors.md for details.
5_Default_RandomForest rmse 0.717335 trained in 2.89 seconds
There was an error during 5_Default_RandomForest training.
Please check AutoML_4\errors.md for details.
* Step ensemble will try to check up to 1 model
Ensemble rmse 0.463419 trained in 0.2 seconds
AutoML fit time: 25.17 seconds
AutoML best model: Ensemble

## Error for 3_Default_Xgboost

'numpy.float64' object is not iterable
Traceback (most recent call last):
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\base_automl.py", line 1095, in _fit
    trained = self.train_model(params)
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\base_automl.py", line 386, in train_model
    mf.save(results_path, model_subpath)
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in save
    preprocessing = [p.to_json() for p in self.preprocessings]
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in <listcomp>
    preprocessing = [p.to_json() for p in self.preprocessings]
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\preprocessing\preprocessing.py", line 582, in to_json
    preprocessing_params["scale_y"] = self._scale_y.to_json()
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\preprocessing\scale.py", line 76, in to_json
    data_json["X_min_values"] = list(self.X_min_values)
TypeError: 'numpy.float64' object is not iterable

Please set a GitHub issue with above error message at: https://github.com/mljar/mljar-supervised/issues/new

## Error for 4_Default_NeuralNetwork

'numpy.float64' object is not iterable
Traceback (most recent call last):
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\base_automl.py", line 1095, in _fit
    trained = self.train_model(params)
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\base_automl.py", line 386, in train_model
    mf.save(results_path, model_subpath)
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in save
    preprocessing = [p.to_json() for p in self.preprocessings]
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in <listcomp>
    preprocessing = [p.to_json() for p in self.preprocessings]
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\preprocessing\preprocessing.py", line 582, in to_json
    preprocessing_params["scale_y"] = self._scale_y.to_json()
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\preprocessing\scale.py", line 76, in to_json
    data_json["X_min_values"] = list(self.X_min_values)
TypeError: 'numpy.float64' object is not iterable

Please set a GitHub issue with above error message at: https://github.com/mljar/mljar-supervised/issues/new

## Error for 5_Default_RandomForest

'numpy.float64' object is not iterable
Traceback (most recent call last):
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\base_automl.py", line 1095, in _fit
    trained = self.train_model(params)
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\base_automl.py", line 386, in train_model
    mf.save(results_path, model_subpath)
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in save
    preprocessing = [p.to_json() for p in self.preprocessings]
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\model_framework.py", line 490, in <listcomp>
    preprocessing = [p.to_json() for p in self.preprocessings]
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\preprocessing\preprocessing.py", line 582, in to_json
    preprocessing_params["scale_y"] = self._scale_y.to_json()
  File "D:\miniconda3_windows\envs\mljar\lib\site-packages\supervised\preprocessing\scale.py", line 76, in to_json
    data_json["X_min_values"] = list(self.X_min_values)
TypeError: 'numpy.float64' object is not iterable

Please set a GitHub issue with above error message at: https://github.com/mljar/mljar-supervised/issues/new

pplonski commented 1 year ago

Thank you @phamtronglam for reporting!

ghost commented 1 year ago

@pplonski Thank you for your quick response. Could you tell me how I should approach to fix this error, please?

pplonski commented 1 year ago

Please try to provide the unit test that will reproduce the problem. It looks that the issue is in preprocessing in scale funcionality.

ghost commented 1 year ago

Hi @pplonski,

Thank you for your reply. I'm sorry that I don't know how to write a unit test for this. I used the same code in the README file for California Regression Dataset and got the same errors. Could you help me check, please?

import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from supervised.automl import AutoML # mljar-supervised

# Load the data
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    pd.DataFrame(housing.data, columns=housing.feature_names),
    housing.target,
    test_size=0.25,
    random_state=123,
)

# train models with AutoML
automl = AutoML(mode="Explain")
automl.fit(X_train, y_train)

# compute the MSE on test data
predictions = automl.predict(X_test)
print("Test MSE:", mean_squared_error(y_test, predictions))

pplonski commented 1 year ago

@phamtronglam I've just tried the example code on fresh environment and Python 3.8 and it is working fine (except few shap warnings).

Please try to update mljar-supervised package:

pip install -U mljar-supervised

If you will still have a problem, please send me your Python version and packages versions from command pip freeze.

ghost commented 1 year ago

@pplonski Thank you for your checking.

Thank you very much. Following the update on a new conda environment, it is working fine now.

Thanks a lot for a very good tool.

pplonski commented 1 year ago

@smuff11 could you check if after package update it is working for you?

NayanKanaparthi commented 1 year ago

hey, I want to work on this Issue. What happened is, in the Data set, few columns have strings, such has brand, model name, etc etc. we have to convert them into numerical form. we can use one hot encoding for that. after cleaning the data. we can fit the model and it will work

pplonski commented 1 year ago

Sure @NayanKanaparthi, thank you. Are you able to provide unit tests to reproduce the issue?

mljar / mljar-supervised