pycaret / pycaret

An open-source, low-code machine learning library in Python
https://www.pycaret.org
MIT License
8.93k stars 1.77k forks source link

[BUG]: get_config('X_train') doesn't return right dataframe #3131

Closed moezali1 closed 1 year ago

moezali1 commented 1 year ago

pycaret version checks

Issue Description

I am running experiment with feature_selection. The setup output shows I should have 2 features in my transformed dataset but when I call get_config('X_train') I see more than 2 features. Why it is not matching?

image

image

See the reproducible notebook:

https://colab.research.google.com/drive/1TDP1RzhusUgZZzww4kKsVNbLGo5tmVV1?usp=sharing

Reproducible Example

see above

Expected Behavior

see above

Actual Results

see above

Installed Versions

master

tvdboom commented 1 year ago

We just merged a PR to change this behavior #3117

ngupta23 commented 1 year ago

@moezali1 With the latest change, you should try getting instead of X_train. X_train refers to the training dataset before the transformation.

get_config('X_train_transformed')
moezali1 commented 1 year ago

@ngupta23 @tvdboom For classification and regression module when I try to get transformed dataset using get_config('X_train_transformed') I get this:


ValueError Traceback (most recent call last) in ----> 1 get_config('X_train_transformed')

2 frames /usr/local/lib/python3.7/dist-packages/pycaret/internal/pycaret_experiment/pycaret_experiment.py in get_config(self, variable) 281 if variable not in self.variables: 282 raise ValueError( --> 283 f"Variable {variable} not found. Possible variables are: {list(self.variables)}" 284 ) 285

ValueError: Variable X_train_transformed not found. Possible variables are: ['log_plots_param', 'master_model_container', 'fold_generator', '_ml_usecase', 'X', 'X_train', 'html_param', 'gpu_param', 'exp_id', 'idx', 'X_test', 'display_container', 'target_param', 'fold_shuffle_param', '_gpu_n_jobs_param', 'y', 'exp_name_log', 'USI', 'logging_param', 'fold_groups_param', 'y_test', 'variable_keys', '_all_models', 'n_jobs_param', 'pipeline', 'data', '_available_plots', 'y_train', '_all_models_internal', 'memory', 'fix_imbalance', 'seed', '_all_metrics', '_is_multiclass']

ngupta23 commented 1 year ago

I think this is not exported right now in regression and classification (it works in time series). This needs to be added in the list of available properties to export in regression and classification.

FYI... related to https://github.com/pycaret/pycaret/issues/3132. I will close this issue and keep the other one open. It is more comprehensive and not specific to one attribute (i.e. X_train_transformed).