I am receiving the following error when running the CTGAN section of of case1_stat.ipynb
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case1_stat.ipynb Cell 14 in <cell line: 5>()
[2](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=1) result_base = test_base(df, label_col, N_EXEC, MODEL_NAME)
[3](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=2) result_df = add_results_df(None, result_base, "Baseline")
----> [5](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=4) restult_fs = test_ctgan_first(df, label_col, N_EXEC, MODEL_NAME, rcorr=False, feat_sel_type=None, art_str=0.6, savefile="1_1.pkl")
[6](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=5) result_df = add_results_df(result_df, restult_fs, "CTGAN 0.6")
[8](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=7) restult_fs = test_ctgan_first(df, label_col, N_EXEC, MODEL_NAME, rcorr=False, feat_sel_type=None, art_str=0.9, savefile="1_2.pkl")
c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case1_stat.ipynb Cell 14 in test_ctgan_first(df, label_col, n_exec, model_name, rcorr, feat_sel_type, art_str, savefile)
[245](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=244) if art_str is not None:
[246](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=245) train_x, train_y = artificial_ctgan(train_x, train_y, art_str, savefile)
--> [247](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=246) train_x, test_x = encode_case1_train_test(train_x, test_x)
[248](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=247) train_x, test_x = impute_case1_train_test(train_x, test_x)
[249](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=248) if feat_sel_type is not None:
c:\Users\morrissharp\Repos\responsible-ai-toolbox-mitigations\notebooks\dataprocessing\case_study\case1_stat.ipynb Cell 14 in encode_case1_train_test(train_x, test_x)
[55](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=54) def encode_case1_train_test(train_x, test_x):
[56](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=55) enc_ord, enc_ohe = get_encoders(df)
---> [57](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=56) enc_ord.fit(train_x)
[58](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=57) train_x_enc = enc_ord.transform(train_x)
[59](vscode-notebook-cell:/c%3A/Users/morrissharp/Repos/responsible-ai-toolbox-mitigations/notebooks/dataprocessing/case_study/case1_stat.ipynb#ch0000013?line=58) test_x_enc = enc_ord.transform(test_x)
File c:\users\morrissharp\repos\responsible-ai-toolbox-mitigations\raimitigations\dataprocessing\encoder\encoder.py:84, in DataEncoding.fit(self, df, y)
82 self._set_column_to_encode()
...
120 + "the order of the existing values of the column col_encode[i]. If a value is not given, "
121 + "it will be assigned a None value."
122 )
ValueError: ERROR: the value '24-26' provided to the the list of values for the key 'inv-nodes' in the 'categories' parameter does not match any of the unique values found in the column 'inv-nodes' of the dataset provided.
Not sure exactly what the cause is for this yet.
Additionally, while investigating this, I noticed a couple of other issues as well:
Ordinal Encoding is performed on age, tumor size, and inv-nodes. But the lexicographic sorting is being done since they are strings, and so there is not numeric sorting.
I am receiving the following error when running the CTGAN section of of
case1_stat.ipynb
Not sure exactly what the cause is for this yet.
Additionally, while investigating this, I noticed a couple of other issues as well:
Ordinal Encoding is performed on age, tumor size, and inv-nodes. But the lexicographic sorting is being done since they are strings, and so there is not numeric sorting.
get_encoders
is called ondf
, butencode_case1_train_test()
does not take indf
as a parameter