interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.
https://interpretml.github.io/DiCE/
MIT License
1.32k stars 185 forks source link

CFs with onehot encoded categorical variables #346

Open nagainfosolutions opened 1 year ago

nagainfosolutions commented 1 year ago

Hello,

This isn't a bug, but more of a query on how Dice can work with data which is onehot encoded by our program.

There are several fields in our dataset that contain categorical data (eg: Gender, Country etc). Such fields are onehot encoded resulting in a change of number of columns and their names. Eg: field 'Gender' changes to => 'Gender_Male' and 'Gender_Female'

How can 'permitted ranges' and 'features_to_vary' attributes work with such data.

For instance permitted range for 'Country' field would be ['USA', 'France', 'Germany']. But after one-hot encoding the field 'Country' is removed. Instead we only have 'Country_USA' (with values 0.0 or 1.0), 'Country_France' and 'Country_Germany'.

Is there a workaround for this problem?

Thanks in advance.

leoncena commented 1 year ago

Hi,

Just out of curiosity (have the same problem):

Have you managed to find a solution?