microsoft / responsible-ai-toolbox-mitigations

Python library for implementing Responsible AI mitigations.
https://responsible-ai-toolbox-mitigations.readthedocs.io/en/latest/
MIT License
57 stars 6 forks source link

ValueError: case1_stat.ipynb error in CTGAN Section #28

Closed morrissharp closed 2 years ago

morrissharp commented 2 years ago

I am receiving the following error when running the CTGAN section of of case1_stat.ipynb

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case1_stat.ipynb Cell 14 in <cell line: 5>()
      [2](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=1) result_base = test_base(df, label_col, N_EXEC, MODEL_NAME)
      [3](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=2) result_df = add_results_df(None, result_base, "Baseline")
----> [5](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=4) restult_fs = test_ctgan_first(df, label_col, N_EXEC, MODEL_NAME, rcorr=False, feat_sel_type=None, art_str=0.6, savefile="1_1.pkl")
      [6](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=5) result_df = add_results_df(result_df, restult_fs, "CTGAN 0.6")
      [8](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=7) restult_fs = test_ctgan_first(df, label_col, N_EXEC, MODEL_NAME, rcorr=False, feat_sel_type=None, art_str=0.9, savefile="1_2.pkl")

c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case1_stat.ipynb Cell 14 in test_ctgan_first(df, label_col, n_exec, model_name, rcorr, feat_sel_type, art_str, savefile)
    [245](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=244) if art_str is not None:
    [246](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=245)    train_x, train_y = artificial_ctgan(train_x, train_y, art_str, savefile)
--> [247](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=246) train_x, test_x = encode_case1_train_test(train_x, test_x)
    [248](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=247) train_x, test_x = impute_case1_train_test(train_x, test_x)
    [249](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=248) if feat_sel_type is not None:

c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case1_stat.ipynb Cell 14 in encode_case1_train_test(train_x, test_x)
     [55](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=54) def encode_case1_train_test(train_x, test_x):
     [56](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=55)     enc_ord, enc_ohe = get_encoders(df)
---> [57](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=56)     enc_ord.fit(train_x)
     [58](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=57)     train_x_enc = enc_ord.transform(train_x)
     [59](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=58)     test_x_enc = enc_ord.transform(test_x)

File c:\users\morrissharp\repos\responsible-ai-toolbox-mitigations\raimitigations\dataprocessing\encoder\encoder.py:84, in DataEncoding.fit(self, df, y)
     82 self._set_column_to_encode()
...
    120         + "the order of the existing values of the column col_encode[i]. If a value is not given, "
    121         + "it will be assigned a None value."
    122     )

ValueError: ERROR: the value '24-26' provided to the the list of values for the key 'inv-nodes' in the 'categories' parameter does not match any of the unique values found in the column 'inv-nodes' of the dataset provided.

Not sure exactly what the cause is for this yet.

Additionally, while investigating this, I noticed a couple of other issues as well: