Closed sgujjaABM closed 2 years ago
@sgujjaABM I don't think it's related to AutoML
It looks like your problem was inferred as classification instead of regression, and as the classification does not have huber
it is failing to exclude - as indicated by the ValueError message.
ValueError: Estimator Not Available huber. Please see docstring for list of available estimators.
Check the following line of your error, it is pointing to classification.py
. It's likely that your prediction variable y
is categorical. Please check.
File "/home/ec2-user/anaconda3/envs/my-rdkit-env/lib/python3.9/site-packages/pycaret/classification.py", line 771, in compare_models
You can run following lines of code to find out full list of supported regression models -
import pycaret
globals_dict = {}
globals_dict["seed"] = 42
globals_dict["gpu_param"] = 0
globals_dict["n_jobs_param"] = -1
_all_models = {
k: v
for k, v in pycaret.containers.models.regression.get_all_model_containers(
globals_dict, raise_errors=True
).items()
if not v.is_special
}
print(_all_models)
I hope this helps!
Thank you for the reply, I am running this as a classification problem, and so initially I ran it without excluding 'huber' and got an error: please see below. The target variable is categorical. Can you please let me know if I am missing something here.
Initiated . . . . . . . . . . . . . . . . . . 09:13:05
Status . . . . . . . . . . . . . . . . . . Compiling Final Models
Estimator . . . . . . . . . . . . . . . . . . Huber Regressor
Traceback (most recent call last):
File "/Users/sgujja/miniconda3/envs/my-rdkit-env/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-10-249556cb224b>", line 1, in <module>
top5 = compare_models(n_select = 5)
File "/Users/sgujja/miniconda3/envs/my-rdkit-env/lib/python3.9/site-packages/pycaret/regression.py", line 763, in compare_models
return pycaret.internal.tabular.compare_models(
File "/Users/sgujja/miniconda3/envs/my-rdkit-env/lib/python3.9/site-packages/pycaret/internal/tabular.py", line 2283, in compare_models
model, model_fit_time = create_model_supervised(
File "/Users/sgujja/miniconda3/envs/my-rdkit-env/lib/python3.9/site-packages/pycaret/internal/tabular.py", line 3026, in create_model_supervised
pipeline_with_model.fit(data_X, data_y, **fit_kwargs)
File "/Users/sgujja/miniconda3/envs/my-rdkit-env/lib/python3.9/site-packages/pycaret/internal/pipeline.py", line 118, in fit
result = super().fit(X, y=y, **fit_kwargs)
File "/Users/sgujja/miniconda3/envs/my-rdkit-env/lib/python3.9/site-packages/imblearn/pipeline.py", line 281, in fit
self._final_estimator.fit(Xt, yt, **fit_params)
File "/Users/sgujja/miniconda3/envs/my-rdkit-env/lib/python3.9/site-packages/sklearn/linear_model/_huber.py", line 296, in fit
self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
File "/Users/sgujja/miniconda3/envs/my-rdkit-env/lib/python3.9/site-packages/sklearn/utils/optimize.py", line 243, in _check_optimize_result
).format(solver, result.status, result.message.decode("latin1"))
AttributeError: 'str' object has no attribute 'decode'
@sgujjaABM If you have imported classification module then you don't need to pass exclude in compare_models. huber
doesn't exist in the classification module, that's why you get this exception.
Thank you for the reply. I updated the code, and it seems to be running now, however it is not using all the cores and so it's running very slowly. Can you please suggest how to speed up processing? Thank you.
##Setting up the environment in PyCaret
classf = setup(data=train, target = 'y',session_id=123,fold_shuffle=True,numeric_features = features) #,remove_multicollinearity = True, multicollinearity_threshold = 0.95)
#
# compare all baseline models and select top 5
#best = compare_models()
top5 = compare_models(n_select = 5)
# tune models
tuned_top5 = [tune_model(i) for i in top5]
# ensemble models
bagged_top5 = [ensemble_model(i) for i in tuned_top5]
# blend models
blender = blend_models(estimator_list = top5)
# stack models
stacker = stack_models(estimator_list = top5)
# automl
best = automl()
print(best)
#analyze best model
#evaluate_model(best)
train_metrics = pull()
print("Train metrics:")
print(train_metrics)
train_metrics.to_csv(path_or_buf=out+"/"+basename+"_train_metrics.csv",index=False,quoting=3,sep=';')
# Deploy Model to generate predictions on hold out data
predict_model(best)
# pull
test_metrics = pull()
print("Test metrics:")
print(test_metrics)
@sgujjaABM Glad this helps.
For multiple core, please open a new issue with logs and more detailed explanation of what you are expecting, etc.
Describe the bug
To Reproduce
Hi, I am trying to run PyCaret automl function, however I get an error with or without excluding 'huber' (please see error below). Can you please help with the code. Thank you.
Initiated . . . . . . . . . . . . . . . . . . 02:33:45 Status . . . . . . . . . . . . . . . . . . Preprocessing Data Text(value="Following data types have been inferred automatically, if they are correct press enter to continue or type 'quit' otherwise.", layout=Layout(width='100%')) Data Type 0 Numeric 1 Numeric 2 Numeric 3 Numeric 4 Numeric ... ... 1029 Numeric 1030 Numeric 1031 Numeric 1032 Numeric y Label [1034 rows x 1 columns] Setup Succesfully Completed! Traceback (most recent call last): File "/home/ec2-user/sgujja/qsar_modeling/repos/fup/qsar_class_pycaret_ppb.py", line 185, in
<pandas.io.formats.style.Styler object at 0x7f99755d4b50>
top5 = compare_models(exclude = ['huber'],n_select = 5)
File "/home/ec2-user/anaconda3/envs/my-rdkit-env/lib/python3.9/site-packages/pycaret/classification.py", line 771, in compare_models
return pycaret.internal.tabular.compare_models(
File "/home/ec2-user/anaconda3/envs/my-rdkit-env/lib/python3.9/site-packages/pycaret/internal/tabular.py", line 1910, in compare_models
raise ValueError(
ValueError: Estimator Not Available huber. Please see docstring for list of available estimators.
Expected behavior
Additional context
Versions