pycaret / pycaret

An open-source, low-code machine learning library in Python
https://www.pycaret.org
MIT License
8.83k stars 1.76k forks source link

Getting error : "Input contains infinity or a value too large for dtype('float32')" while using predict_model function #1048

Closed aniketkumar430 closed 3 years ago

aniketkumar430 commented 3 years ago

Hi Team,

While predicting on new data ,getting issue : **# for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains infinity or a value too large for dtype('float32').**

I have already imputed numerical/categorical factors within setup function.

I believe this issue was fixed in earlier release but getting it with pycaret 2.2.3 version as well . The earlier issue is mentioned here : https://github.com/pycaret/pycaret/issues/290.

Any help , how to fix this soon.

Yard1 commented 3 years ago

Is it possible for you to share the dataset?

aniketkumar430 commented 3 years ago

Sorry...I wish I could , but its internal data so will not be able to share the data.

Yard1 commented 3 years ago

Can you upload the logs.log file that is automatically generated in the folder pycaret is ran from?

aniketkumar430 commented 3 years ago

Hey I was able to fix the issue after applying normalization.

Scoodood commented 3 years ago

I had the same issue, but setting normalize=False in the setup() fixed the problem in my case.

Scoodood commented 3 years ago

hi @Yard1,

I am running Python 3.8 + Pycaret 2.2 + Win10. The error can be reproduced with this sample code on my system. I hope you can reproduce the same thing on your end as well.

from pycaret.classification import setup
import pandas as pd
import numpy as np

x = np.array([ 
    3.88823539e-01,  3.92927796e-01,  2.98611104e-01,  1.45454541e-01,
    3.93287897e-01,  3.06536227e-01,  4.58397925e-01,  3.46689314e-01,
    9.27706584e-02,  1.66124269e-01,  4.18312103e-01,  3.75978529e-01,
    3.81876916e-01,  3.81964803e-01, -4.60996240e-01,  2.82670468e-01,
    4.33387399e-01,  1.92691535e-01,  5.16472697e-01,  4.95327115e-01,
    5.04636049e-01,  2.67145514e-01,  1.99887961e-01,  2.06762537e-01,
    3.58059108e-01,  2.12162361e-01,  3.38421494e-01,  3.40539455e-01,
    3.43417168e-01,  2.40292147e-01,  3.75947833e-01,  3.46175045e-01,
    2.40784839e-01,  2.44629219e-01,  2.56736457e-01,  3.06967199e-01,
    3.03990245e-01,  3.71131212e-01,  3.76697600e-01,  7.49158263e-02,
    1.30779341e-01,  1.98675290e-01,  3.50690275e-01,  3.68143469e-01,
    3.58908772e-01,  1.88099176e-01,  3.09562832e-01,  3.53744358e-01,
    3.30147058e-01,  4.32091355e-01,  4.73342061e-01,  4.02529031e-01,
    3.95830899e-01,  4.08205032e-01,  3.03054273e-01,  3.13991755e-01,
    4.54941124e-01,  2.79870629e-01,  3.84567887e-01,  3.53054166e-01,
    3.89947355e-01,  3.41697007e-01,  3.92287225e-01,  4.39686209e-01,
    4.21999991e-01,  3.77559274e-01,  3.07799846e-01,  4.80750799e-01,
    7.19498424e-03, -9.66454223e-02,  3.41324776e-01,  3.48039210e-01,
    3.52605551e-01,  5.78875184e-01,  2.80002415e-01,  2.31201172e+05,
    4.10117149e-01,  2.84535408e-01,  4.51396853e-01,  4.01419282e-01,
    4.21781093e-01,  4.13252383e-01,  4.48696792e-01,  2.89131910e-01,
    3.28029275e-01,  2.95504212e-01,  1.04279131e-01, -3.43457031e+05,
   -5.58490574e-01,  5.86900949e-01,  3.09354603e-01,  6.02193832e-01,
   -1.47712421e+00,  2.58185416e-01, -1.22726667e+00,  3.85695040e-01,
    4.32470560e-01,  4.96364772e-01,  4.36632335e-01,  2.94822194e-02,
    1.63721621e-01,  2.19358712e-01,  2.96442688e-01,  3.52912962e-01,
    3.08798224e-01,  2.20496356e-01,  2.80915618e-01,  2.55157828e-01,
    3.57474774e-01,  3.53905797e-01, -1.48139155e+00,  4.10179883e-01,
    6.29500812e-03,  1.48310661e-02,  4.55011964e-01,  2.19447702e-01,
    2.16419950e-01,  3.47419351e-01,  4.21552539e-01,  4.49999988e-01,
    5.19404590e-01,  0.00000000e+00,  0.00000000e+00,  2.76207983e-01,
    7.30088472e-01,  5.72765946e-01,  1.36000901e-01,  8.74999985e-02,
    3.35228503e-01,  2.81857818e-01,  3.48984361e-01,  4.05978799e-01,
    4.70830649e-01,  4.45659220e-01, -2.27870807e-01, -3.63451093e-01,
    4.85417366e-01,  4.69966292e-01,  3.03950816e-01,  2.77605176e-01,
    3.33973199e-01,  5.43402791e-01,  2.27142990e-01,  2.26787969e-01])

np.random.seed(123)
target = np.random.choice(['Y', 'N'], size=len(x))
data = pd.DataFrame(dict(x=x, target=target))

setup(
    data=data,
    target='target',
    train_size=0.7,
    session_id=123,
    normalize=True,
    transformation=True,
    verbose=False,
    silent=True)

Error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-b621de61d635> in <module>
     47 data = pd.DataFrame(dict(x=x, target=target))
     48 
---> 49 setup(
     50     data=data,
     51     target='target',

~\AppData\Roaming\Python\Python38\site-packages\pycaret\classification.py in setup(data, target, train_size, test_data, preprocess, imputation_type, iterative_imputation_iters, categorical_features, categorical_imputation, categorical_iterative_imputer, ordinal_features, high_cardinality_features, high_cardinality_method, numeric_features, numeric_imputation, numeric_iterative_imputer, date_features, ignore_features, normalize, normalize_method, transformation, transformation_method, handle_unknown_categorical, unknown_categorical_method, pca, pca_method, pca_components, ignore_low_variance, combine_rare_levels, rare_level_threshold, bin_numeric_features, remove_outliers, outliers_threshold, remove_multicollinearity, multicollinearity_threshold, remove_perfect_collinearity, create_clusters, cluster_iter, polynomial_features, polynomial_degree, trigonometry_features, polynomial_threshold, group_features, group_names, feature_selection, feature_selection_threshold, feature_selection_method, feature_interaction, feature_ratio, interaction_threshold, fix_imbalance, fix_imbalance_method, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, custom_pipeline, html, session_id, log_experiment, experiment_name, log_plots, log_profile, log_data, silent, verbose, profile, profile_kwargs)
    578         log_plots = ["auc", "confusion_matrix", "feature"]
    579 
--> 580     return pycaret.internal.tabular.setup(
    581         ml_usecase="classification",
    582         available_plots=available_plots,

~\AppData\Roaming\Python\Python38\site-packages\pycaret\internal\tabular.py in setup(data, target, ml_usecase, available_plots, train_size, test_data, preprocess, imputation_type, iterative_imputation_iters, categorical_features, categorical_imputation, categorical_iterative_imputer, ordinal_features, high_cardinality_features, high_cardinality_method, numeric_features, numeric_imputation, numeric_iterative_imputer, date_features, ignore_features, normalize, normalize_method, transformation, transformation_method, handle_unknown_categorical, unknown_categorical_method, pca, pca_method, pca_components, ignore_low_variance, combine_rare_levels, rare_level_threshold, bin_numeric_features, remove_outliers, outliers_threshold, remove_multicollinearity, multicollinearity_threshold, remove_perfect_collinearity, create_clusters, cluster_iter, polynomial_features, polynomial_degree, trigonometry_features, polynomial_threshold, group_features, group_names, feature_selection, feature_selection_threshold, feature_selection_method, feature_interaction, feature_ratio, interaction_threshold, fix_imbalance, fix_imbalance_method, transform_target, transform_target_method, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, custom_pipeline, html, session_id, log_experiment, experiment_name, log_plots, log_profile, log_data, silent, verbose, profile, profile_kwargs, display)
   1309         # workaround to also transform target
   1310         dtypes.final_training_columns.append(target)
-> 1311         test_data = prep_pipe.transform(test_data)
   1312 
   1313         X_train = train_data.drop(target, axis=1)

c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\pipeline.py in _transform(self, X)
    547         Xt = X
    548         for _, _, transform in self._iter():
--> 549             Xt = transform.transform(Xt)
    550         return Xt
    551 

~\AppData\Roaming\Python\Python38\site-packages\pycaret\internal\preprocess.py in transform(self, dataset, y)
   1313         if len(self.numeric_features) > 0:
   1314             self.data_t = pd.DataFrame(
-> 1315                 self.scale_and_power.transform(data[self.numeric_features])
   1316             )
   1317             # we need to set the same index as original data

c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\preprocessing\_data.py in transform(self, X)
   2863 
   2864         if self.standardize:
-> 2865             X = self._scaler.transform(X)
   2866 
   2867         return X

c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\preprocessing\_data.py in transform(self, X, copy)
    789 
    790         copy = copy if copy is not None else self.copy
--> 791         X = self._validate_data(X, reset=False,
    792                                 accept_sparse='csr', copy=copy,
    793                                 estimator=self, dtype=FLOAT_DTYPES,

c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    418                     f"requires y to be passed, but the target y is None."
    419                 )
--> 420             X = check_array(X, **check_params)
    421             out = X
    422         else:

c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    642 
    643         if force_all_finite:
--> 644             _assert_all_finite(array,
    645                                allow_nan=force_all_finite == 'allow-nan')
    646 

c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
     94                 not allow_nan and not np.isfinite(X).all()):
     95             type_err = 'infinity' if allow_nan else 'NaN, infinity'
---> 96             raise ValueError(
     97                     msg_err.format
     98                     (type_err,

ValueError: Input contains infinity or a value too large for dtype('float32').
Yard1 commented 3 years ago

I will look into it, thanks.

actcod01 commented 3 years ago

Pycaret: v2.2.3 Task: Regression


Hi,

I m also getting the same issue with Automobile dataset. @Yard1 you can use the same dataset from Pycaret to get data.

How we can fix that issue?

Thanks,

actcod01 commented 3 years ago

I had the same issue, but setting normalize=False in the setup() fixed the problem in my case.

But by default, it's always False?

Scoodood commented 3 years ago

I had the same issue, but setting normalize=False in the setup() fixed the problem in my case.

But by default, it's always False?

Yes it is. I get the error when I set normalize=True

s-bhatia commented 3 years ago

The error also comes when we use transform_target = True. See my setup below:

ex = setup(data = trainData , target = targetName , session_id=int(sid) , train_size=.7 , n_jobs=20 , normalize=True , normalize_method='maxabs' # 'maxabs' , pca=False , pca_components=3 , ignore_features=ignoreCols , fold = 5 , fold_strategy='timeseries' , polynomial_features=False , feature_ratio=True , feature_interaction=True , remove_multicollinearity=False , transformation=True , transform_target=True , transform_target_method='yeo-johnson' , create_clusters=False , data_split_shuffle=False , silent=True ,experiment_name = 'exp' + str(sid) ,verbose=False )

The error disappears if i set transform_target=False

github-actions[bot] commented 3 years ago

Stale issue message: This issue will be automatically closed by GitHub Actions in 1 week if there is no further activity.

vemaparna commented 3 years ago

Pycaret version 2.3.1

Got the same issue while building model, inspite of data being cleared from null values and infinite values. When run with preprocess = False - the error disappears but strangely comes back while compare_models() Tried all the options like normalize, transformation etc. but none of them worked out

ra67052 commented 2 years ago

Strange I had the same issue, however, earlier in the day I ran it no problem, then afterwards run the notebook and got the same error, I change the kernel from python 3 to python 3.8.2 and it worked :)

meliksahturker commented 2 years ago

I confirm this issue with my data. Cannot run setup with preprocess = True.

dvirginz commented 2 years ago

I have the same problem when normalizing the test as well.

ngupta23 commented 2 years ago

@dvirginz Since this issue is closed, I would recommend submitting a new issue for your problem. When submitting the issue, it will ask for a few things from you so we can reproduce the problem and provide appropriate guidance. Thanks for your understanding.