interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.
MIT License
1.33k stars 184 forks source link

High rate of incorrect predictions #12

Closed sina-salek closed 4 years ago

sina-salek commented 4 years ago


I'm not sure if this is something I'm doing wrong or you've encountered this before. Consider the following actions:

In the adult dataset I put aside a validation set, which neither my model nor DiCE has seen. From those, I select a number of samples whose age = 31, and use those to generate counterfactuals by varying everything other than age. Other than this I am not using any weights. After that, I use my model to get a prediction on these generated counterfactuals. In a lot of the cases my model's prediction is not what DiCE thinks it would give.

Is there any way for me to increase DiCE's fidelity to my model?


raam93 commented 4 years ago

That's not possible, DiCE explanations are always truthful to the ML model by definition. In other words, we are simply tweaking the input instance until we get a different prediction from the same ML model. It is difficult to say anything more without looking at your code. Perhaps, you are missing out on something while creating a validation set. For instance, are you normalizing the continuous features and one-hot-encoding the categorical features in the validation data? You can use DiCE's data interface to create a validation set as follows

dataset = helpers.load_adult_income_dataset()
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')
train, test = d.split_data(d.normalize_data(d.one_hot_encoded_data))
X_test = test.loc[:, test.columns != 'income']
y_test = test.loc[:, test.columns == 'income']

For your reference, I have included a sample code implementing your logic that gave me valid results.

import dice_ml
from dice_ml.utils import helpers

import tensorflow as tf
from tensorflow import keras

print(tf.__version__) # 2.1.0

# creating a testing dataset - the inbuilt ML model in DiCE for adult data is trained only on the 'train' data below
dataset = helpers.load_adult_income_dataset()
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')
train, test = d.split_data(d.normalize_data(d.one_hot_encoded_data))
X_test = test.loc[:, test.columns != 'income']
y_test = test.loc[:, test.columns == 'income']

# get normalized age=31
normalized_age = (31-d.train_df['age'].min())/((d.train_df['age'].max()-d.train_df['age'].min())) # should print 0.1917808219178082

# we can verify if the above number is correct using the following
# (normalized_age*(d.train_df['age'].max() - d.train_df['age'].min())) + d.train_df['age'].min() # should give you 31

my_test = X_test[X_test['age']==normalized_age]
print(my_test.shape) # (187,29)
# there are 187 instances with age =31 in our data, I'm choosing the first one below as an example.

# create a test instance dictionary
my_test_instance = {}
for feature in d.feature_names:
    if feature in d.continuous_feature_names:
        my_test_instance[feature] = (my_test[feature].iloc[0]*(d.train_df[feature].max() - d.train_df[feature].min())) + d.train_df[feature].min()
        encoded_features = [feat for feat in d.encoded_feature_names if feat.startswith(feature)]
        for encoded_feat in encoded_features:
            if my_test.iloc[0][encoded_feat] == 1.0:
                my_test_instance[feature] = encoded_feat.split(feature+'_')[1]

# {'age': 31.0,
#  'workclass': 'Private',
#  'education': 'HS-grad',
#  'marital_status': 'Single',
#  'occupation': 'Blue-Collar',
#  'race': 'White',
#  'gender': 'Female',
#  'hours_per_week': 40.0}

d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')

backend = 'TF'+tf.__version__[0] # TF2
ML_modelpath = helpers.get_adult_income_modelpath(backend=backend)
m = dice_ml.Model(model_path= ML_modelpath, backend=backend)

exp = dice_ml.Dice(d, m)

# changing every feature except age
dice_exp = exp.generate_counterfactuals(my_test_instance, total_CFs=4, desired_class="opposite", features_to_vary=['workclass', 'education', 'marital_status', 'occupation', 'race', 'gender', 'hours_per_week'])

# visualize the results
dice_exp.visualize_as_list(show_only_changes=True) # prints the following
# Query instance (original outcome : 0)
# [31.0, 'Private', 'HS-grad', 'Single', 'Blue-Collar', 'White', 'Female', 40.0, 0.019464194774627686]
# Diverse Counterfactual set (new outcome : 1)
# ['-', 'Self-Employed', '-', 'Married', 'White-Collar', '-', '-', 48.0, 0.75]
# ['-', '-', 'Doctorate', 'Married', '-', '-', '-', 26.0, 0.697]
# ['-', '-', 'Masters', 'Married', '-', '-', '-', '-', 0.749]
# ['-', '-', 'Prof-school', 'Married', '-', '-', '-', 58.0, 0.858]

# To check that the predictions are indeed equal
for ix, cf in enumerate(exp.final_cfs):
    model_pred = exp.predict_fn(cf)
    cf_pred = exp.cfs_preds[ix]
    print(model_pred, cf_pred)
# prints the following
# [[0.75035185]] [[0.75035185]]
# [[0.69670826]] [[0.69670826]]
# [[0.73952144]] [[0.73952144]]
# [[0.85771877]] [[0.85771877]]
sina-salek commented 4 years ago

Thanks! It's very kind of you to provide the code. It helped me find my bug.