pycaret / pycaret

An open-source, low-code machine learning library in Python
https://www.pycaret.org
MIT License
8.83k stars 1.76k forks source link

[BUG]: #3943

Open am-vaibhav opened 6 months ago

am-vaibhav commented 6 months ago

pycaret version checks

Issue Description

there is issue while finalizing the model [setup(oppr, target='stage_name', ignore_features=ignore_columns, fix_imbalance=True, normalize=True, normalize_method='robust', transformation=True,fold_strategy='stratifiedkfold', fold=5, fold_shuffle=True) best = compare_models(include=['rf'], sort='F1') final_best = finalize_model(best)] and the error is ["*** IndexError: Length of values (7530) does not match the length of index (6857). This usually happens when transformations that drop rows aren't applied on all the columns."] it is becuase of SMOTE method is used to fix imbalanced target matrix. How to fix it?

Reproducible Example

there is issue while finalizing the model [setup(oppr, target='stage_name', ignore_features=ignore_columns, fix_imbalance=True,
normalize=True, normalize_method='robust', transformation=True,fold_strategy='stratifiedkfold', fold=5, fold_shuffle=True)
best = compare_models(include=['rf'], sort='F1')
final_best = finalize_model(best)] and the error is ["*** IndexError: Length of values (7530) does not match the length of index (6857). This usually happens when transformations that drop rows aren't applied on all the columns."] it is becuase of SMOTE method is used to fix imbalanced target matrix. How to fix it?

Expected Behavior

there is issue while finalizing the model [setup(oppr, target='stage_name', ignore_features=ignore_columns, fix_imbalance=True, normalize=True, normalize_method='robust', transformation=True,fold_strategy='stratifiedkfold', fold=5, fold_shuffle=True) best = compare_models(include=['rf'], sort='F1') final_best = finalize_model(best)] and the error is ["*** IndexError: Length of values (7530) does not match the length of index (6857). This usually happens when transformations that drop rows aren't applied on all the columns."] it is becuase of SMOTE method is used to fix imbalanced target matrix. How to fix it?

Actual Results

*** IndexError: Length of values (7530) does not match the length of index (6857). This usually happens when transformations that drop rows aren't applied on all the columns.

Installed Versions

'3.3.0'

ohamza-dgs commented 4 months ago

I have the same error when using "finalize_model". Just a simple setup() followed by create_model() -> tune_model(). After tuning I call finalize_model() on tuned_model object which throws the error:

"IndexError: Length of values (17512) does not match length of index (18434). This usually happens when transformations that drop rows aren't applied on all the columns."

Although it is stated at some solutions suggestions, setting "index=True/False" in setup() does not fix the issue. Looks like disabling "n_features_to_select" and "polynomial_features" parameters in setup() generally fix the issue but not all the time!

pycaret.version = 3.2.0

CJC-ds commented 4 months ago

Also encountering this issue in pycaret 3.2.0 Has this been fixed yet?

Also tried setting index=False in setup(), but still encounter the same error. Feature selection is required, so disabling n_features_to_select is not really an option for me, as suggested above.

I found a workaround for this... After checking the source code, the error is caused by setting up a merge between the original_df and the transformed df, just so that they can merge. But due to oversampling with SMOTE on the minority class, the two indices do not align.

The main purpose for this class method is to return a df that has the correct ordering of the columns. Order does not really matter in my case, and I have not checked any downstream implications of this fix...

If you care about ordering, you can add your own column order with monkey patch fix below at ... .

from pycaret.internal.preprocess.transformers import TransformerWrapper

def _reorder_cols(self, df, original_df):
  ...
  return df

TransformerWrapper._reorder_cols = _reorder_cols
Lawlantosin commented 2 months ago

It is not working for me. Is the issue related to any of the pycaret version or it is a general version issue?

celestinoxp commented 2 months ago

@CJC-ds can you make a pull-request to fix the problem?

msaad1311 commented 4 weeks ago

I am encountering the same error. I tried using the following solutions suggested but no luck:

can someone please let me know how to solve it?