microsoft / responsible-ai-toolbox-mitigations

Python library for implementing Responsible AI mitigations.
https://responsible-ai-toolbox-mitigations.readthedocs.io/en/latest/
MIT License
57 stars 6 forks source link

Case3_stat.ipynb: ValueError. mismatched shapes #32

Closed morrissharp closed 2 years ago

morrissharp commented 2 years ago

In Case3_stat.ipynb case study notebook, in cell `Artificial Instances - CTGAN, I receive the following error:

ValueError                                Traceback (most recent call last)
c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case3_stat.ipynb Cell 20 in <cell line: 8>()
      [5](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=4) result_tr = test_corr_transf(df, label_col, N_EXEC, dp.DataStandardScaler, MODEL_NAME, num_col)
      [6](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=5) result_df = add_results_df(result_df, result_tr, "Std.")
----> [8](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=7) restult_fs = test_ctgan_first(df, label_col, N_EXEC, MODEL_NAME, rcorr=True, scaler_ref=dp.DataStandardScaler, feat_sel_type=None, art_str=0.2, savefile="3_1.pkl")
      [9](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=8) result_df = add_results_df(result_df, restult_fs, "CTGAN 0.2 Std.")
     [11](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=10) restult_fs = test_ctgan_first(df, label_col, N_EXEC, MODEL_NAME, rcorr=True, scaler_ref=dp.DataStandardScaler, feat_sel_type=None, art_str=0.6, savefile="3_2.pkl")

c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case3_stat.ipynb Cell 20 in test_ctgan_first(df, label_col, n_exec, model_name, rcorr, scaler_ref, num_col, feat_sel_type, art_str, savefile)
    [260](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=259) if art_str is not None:
    [261](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=260)    train_x, train_y = artificial_ctgan(train_x, train_y, art_str, savefile)
--> [262](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=261) train_x, test_x = encode_case3_train_test(train_x, test_x)
    [263](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=262) train_x, test_x = impute_case3_train_test(train_x, test_x)
    [264](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=263) if feat_sel_type is not None:

c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case3_stat.ipynb Cell 20 in encode_case3_train_test(train_x, test_x)
     [33](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=32) enc_ohe = dp.EncoderOHE(drop=False, unknown_err=False, verbose=False)
     [34](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=33) enc_ohe.fit(train_x)
---> [35](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=34) train_x_enc = enc_ohe.transform(train_x)
     [36](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=35) test_x_enc = enc_ohe.transform(test_x)
     [37](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case3_stat.ipynb#ch0000019?line=36) return train_x_enc, test_x_enc

File c:\users\morrissharp\repos\responsible-ai-toolbox-mitigations\raimitigations\dataprocessing\encoder\encoder.py:108, in DataEncoding.transform(self, df)
    106 self._check_if_fitted()
...
    391 passed = values.shape
    392 implied = (len(index), len(columns))
--> 393 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (2582, 30), indices imply (2582, 29)

I am not sure if this is related to issue #31. But potentially there is an issue with the label_col not being included in the df when necessary.