Closed aniketkumar430 closed 3 years ago
Is it possible for you to share the dataset?
Sorry...I wish I could , but its internal data so will not be able to share the data.
Can you upload the logs.log file that is automatically generated in the folder pycaret is ran from?
Hey I was able to fix the issue after applying normalization.
I had the same issue, but setting normalize=False
in the setup()
fixed the problem in my case.
hi @Yard1,
I am running Python 3.8 + Pycaret 2.2 + Win10. The error can be reproduced with this sample code on my system. I hope you can reproduce the same thing on your end as well.
from pycaret.classification import setup
import pandas as pd
import numpy as np
x = np.array([
3.88823539e-01, 3.92927796e-01, 2.98611104e-01, 1.45454541e-01,
3.93287897e-01, 3.06536227e-01, 4.58397925e-01, 3.46689314e-01,
9.27706584e-02, 1.66124269e-01, 4.18312103e-01, 3.75978529e-01,
3.81876916e-01, 3.81964803e-01, -4.60996240e-01, 2.82670468e-01,
4.33387399e-01, 1.92691535e-01, 5.16472697e-01, 4.95327115e-01,
5.04636049e-01, 2.67145514e-01, 1.99887961e-01, 2.06762537e-01,
3.58059108e-01, 2.12162361e-01, 3.38421494e-01, 3.40539455e-01,
3.43417168e-01, 2.40292147e-01, 3.75947833e-01, 3.46175045e-01,
2.40784839e-01, 2.44629219e-01, 2.56736457e-01, 3.06967199e-01,
3.03990245e-01, 3.71131212e-01, 3.76697600e-01, 7.49158263e-02,
1.30779341e-01, 1.98675290e-01, 3.50690275e-01, 3.68143469e-01,
3.58908772e-01, 1.88099176e-01, 3.09562832e-01, 3.53744358e-01,
3.30147058e-01, 4.32091355e-01, 4.73342061e-01, 4.02529031e-01,
3.95830899e-01, 4.08205032e-01, 3.03054273e-01, 3.13991755e-01,
4.54941124e-01, 2.79870629e-01, 3.84567887e-01, 3.53054166e-01,
3.89947355e-01, 3.41697007e-01, 3.92287225e-01, 4.39686209e-01,
4.21999991e-01, 3.77559274e-01, 3.07799846e-01, 4.80750799e-01,
7.19498424e-03, -9.66454223e-02, 3.41324776e-01, 3.48039210e-01,
3.52605551e-01, 5.78875184e-01, 2.80002415e-01, 2.31201172e+05,
4.10117149e-01, 2.84535408e-01, 4.51396853e-01, 4.01419282e-01,
4.21781093e-01, 4.13252383e-01, 4.48696792e-01, 2.89131910e-01,
3.28029275e-01, 2.95504212e-01, 1.04279131e-01, -3.43457031e+05,
-5.58490574e-01, 5.86900949e-01, 3.09354603e-01, 6.02193832e-01,
-1.47712421e+00, 2.58185416e-01, -1.22726667e+00, 3.85695040e-01,
4.32470560e-01, 4.96364772e-01, 4.36632335e-01, 2.94822194e-02,
1.63721621e-01, 2.19358712e-01, 2.96442688e-01, 3.52912962e-01,
3.08798224e-01, 2.20496356e-01, 2.80915618e-01, 2.55157828e-01,
3.57474774e-01, 3.53905797e-01, -1.48139155e+00, 4.10179883e-01,
6.29500812e-03, 1.48310661e-02, 4.55011964e-01, 2.19447702e-01,
2.16419950e-01, 3.47419351e-01, 4.21552539e-01, 4.49999988e-01,
5.19404590e-01, 0.00000000e+00, 0.00000000e+00, 2.76207983e-01,
7.30088472e-01, 5.72765946e-01, 1.36000901e-01, 8.74999985e-02,
3.35228503e-01, 2.81857818e-01, 3.48984361e-01, 4.05978799e-01,
4.70830649e-01, 4.45659220e-01, -2.27870807e-01, -3.63451093e-01,
4.85417366e-01, 4.69966292e-01, 3.03950816e-01, 2.77605176e-01,
3.33973199e-01, 5.43402791e-01, 2.27142990e-01, 2.26787969e-01])
np.random.seed(123)
target = np.random.choice(['Y', 'N'], size=len(x))
data = pd.DataFrame(dict(x=x, target=target))
setup(
data=data,
target='target',
train_size=0.7,
session_id=123,
normalize=True,
transformation=True,
verbose=False,
silent=True)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-b621de61d635> in <module>
47 data = pd.DataFrame(dict(x=x, target=target))
48
---> 49 setup(
50 data=data,
51 target='target',
~\AppData\Roaming\Python\Python38\site-packages\pycaret\classification.py in setup(data, target, train_size, test_data, preprocess, imputation_type, iterative_imputation_iters, categorical_features, categorical_imputation, categorical_iterative_imputer, ordinal_features, high_cardinality_features, high_cardinality_method, numeric_features, numeric_imputation, numeric_iterative_imputer, date_features, ignore_features, normalize, normalize_method, transformation, transformation_method, handle_unknown_categorical, unknown_categorical_method, pca, pca_method, pca_components, ignore_low_variance, combine_rare_levels, rare_level_threshold, bin_numeric_features, remove_outliers, outliers_threshold, remove_multicollinearity, multicollinearity_threshold, remove_perfect_collinearity, create_clusters, cluster_iter, polynomial_features, polynomial_degree, trigonometry_features, polynomial_threshold, group_features, group_names, feature_selection, feature_selection_threshold, feature_selection_method, feature_interaction, feature_ratio, interaction_threshold, fix_imbalance, fix_imbalance_method, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, custom_pipeline, html, session_id, log_experiment, experiment_name, log_plots, log_profile, log_data, silent, verbose, profile, profile_kwargs)
578 log_plots = ["auc", "confusion_matrix", "feature"]
579
--> 580 return pycaret.internal.tabular.setup(
581 ml_usecase="classification",
582 available_plots=available_plots,
~\AppData\Roaming\Python\Python38\site-packages\pycaret\internal\tabular.py in setup(data, target, ml_usecase, available_plots, train_size, test_data, preprocess, imputation_type, iterative_imputation_iters, categorical_features, categorical_imputation, categorical_iterative_imputer, ordinal_features, high_cardinality_features, high_cardinality_method, numeric_features, numeric_imputation, numeric_iterative_imputer, date_features, ignore_features, normalize, normalize_method, transformation, transformation_method, handle_unknown_categorical, unknown_categorical_method, pca, pca_method, pca_components, ignore_low_variance, combine_rare_levels, rare_level_threshold, bin_numeric_features, remove_outliers, outliers_threshold, remove_multicollinearity, multicollinearity_threshold, remove_perfect_collinearity, create_clusters, cluster_iter, polynomial_features, polynomial_degree, trigonometry_features, polynomial_threshold, group_features, group_names, feature_selection, feature_selection_threshold, feature_selection_method, feature_interaction, feature_ratio, interaction_threshold, fix_imbalance, fix_imbalance_method, transform_target, transform_target_method, data_split_shuffle, data_split_stratify, fold_strategy, fold, fold_shuffle, fold_groups, n_jobs, use_gpu, custom_pipeline, html, session_id, log_experiment, experiment_name, log_plots, log_profile, log_data, silent, verbose, profile, profile_kwargs, display)
1309 # workaround to also transform target
1310 dtypes.final_training_columns.append(target)
-> 1311 test_data = prep_pipe.transform(test_data)
1312
1313 X_train = train_data.drop(target, axis=1)
c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\pipeline.py in _transform(self, X)
547 Xt = X
548 for _, _, transform in self._iter():
--> 549 Xt = transform.transform(Xt)
550 return Xt
551
~\AppData\Roaming\Python\Python38\site-packages\pycaret\internal\preprocess.py in transform(self, dataset, y)
1313 if len(self.numeric_features) > 0:
1314 self.data_t = pd.DataFrame(
-> 1315 self.scale_and_power.transform(data[self.numeric_features])
1316 )
1317 # we need to set the same index as original data
c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\preprocessing\_data.py in transform(self, X)
2863
2864 if self.standardize:
-> 2865 X = self._scaler.transform(X)
2866
2867 return X
c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\preprocessing\_data.py in transform(self, X, copy)
789
790 copy = copy if copy is not None else self.copy
--> 791 X = self._validate_data(X, reset=False,
792 accept_sparse='csr', copy=copy,
793 estimator=self, dtype=FLOAT_DTYPES,
c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
418 f"requires y to be passed, but the target y is None."
419 )
--> 420 X = check_array(X, **check_params)
421 out = X
422 else:
c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
642
643 if force_all_finite:
--> 644 _assert_all_finite(array,
645 allow_nan=force_all_finite == 'allow-nan')
646
c:\users\userA\miniconda3\envs\py38\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
94 not allow_nan and not np.isfinite(X).all()):
95 type_err = 'infinity' if allow_nan else 'NaN, infinity'
---> 96 raise ValueError(
97 msg_err.format
98 (type_err,
ValueError: Input contains infinity or a value too large for dtype('float32').
I will look into it, thanks.
Pycaret: v2.2.3 Task: Regression
Hi,
I m also getting the same issue with Automobile dataset. @Yard1 you can use the same dataset from Pycaret to get data.
How we can fix that issue?
Thanks,
I had the same issue, but setting
normalize=False
in thesetup()
fixed the problem in my case.
But by default, it's always False?
I had the same issue, but setting
normalize=False
in thesetup()
fixed the problem in my case.But by default, it's always False?
Yes it is. I get the error when I set normalize=True
The error also comes when we use transform_target = True. See my setup below:
ex = setup(data = trainData , target = targetName , session_id=int(sid) , train_size=.7 , n_jobs=20 , normalize=True , normalize_method='maxabs' # 'maxabs' , pca=False , pca_components=3 , ignore_features=ignoreCols , fold = 5 , fold_strategy='timeseries' , polynomial_features=False , feature_ratio=True , feature_interaction=True , remove_multicollinearity=False , transformation=True , transform_target=True , transform_target_method='yeo-johnson' , create_clusters=False , data_split_shuffle=False , silent=True ,experiment_name = 'exp' + str(sid) ,verbose=False )
The error disappears if i set transform_target=False
Stale issue message: This issue will be automatically closed by GitHub Actions in 1 week if there is no further activity.
Pycaret version 2.3.1
Got the same issue while building model, inspite of data being cleared from null values and infinite values. When run with preprocess = False - the error disappears but strangely comes back while compare_models() Tried all the options like normalize, transformation etc. but none of them worked out
Strange I had the same issue, however, earlier in the day I ran it no problem, then afterwards run the notebook and got the same error, I change the kernel from python 3 to python 3.8.2 and it worked :)
I confirm this issue with my data. Cannot run setup with preprocess = True.
I have the same problem when normalizing the test as well.
@dvirginz Since this issue is closed, I would recommend submitting a new issue for your problem. When submitting the issue, it will ask for a few things from you so we can reproduce the problem and provide appropriate guidance. Thanks for your understanding.
Hi Team,
While predicting on new data ,getting issue : **# for object dtype data, we only check for NaNs (GH-13254)
ValueError: Input contains infinity or a value too large for dtype('float32').**
I have already imputed numerical/categorical factors within setup function.
I believe this issue was fixed in earlier release but getting it with pycaret 2.2.3 version as well . The earlier issue is mentioned here : https://github.com/pycaret/pycaret/issues/290.
Any help , how to fix this soon.