interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.
https://interpretml.github.io/DiCE/
MIT License
1.35k stars 187 forks source link

Unable to generate counterfactuals for certain instances #226

Open grtwrrn opened 3 years ago

grtwrrn commented 3 years ago

Hi,

I've trained DICE with a k-nn classifier, and want to generate counterfactuals for a test set from the same dataset. It works fine for most instances in the test set, but for others, it remains stuck indefinitely (see screenshot below). I'm wondering why this is, and how it can be fixed. Thanks!

image

gaugup commented 3 years ago

@grtwrrn, sorry you ran into this issue. Could you provide the sample code and the dataset so that I can repro the issue at my end? Which version of dice-ml are you running? The latest is 0.7.1. If you could try with 0.7.1 and see if you still continue to run into the issue.

Regards,

grtwrrn commented 3 years ago

Hi @gaugup, thanks for your response! I've updated to 0.7.1 now but I'm still running into the same issue. I'm not sure if I have permission to share the dataset so I won't here, but this is my code (k_value = 1): image image image

As you can see, DICE works fine for the first two cases, but loops infinitely on the third. This happens throughout the test set – some work but the majority of the ones I've tried seem to loop. I'm wondering if the issue is related to the one in #46 ? The continuous features in my dataset are already z-scored, so the changes may be quite small. I'm using the model-agnostic method rather than the gradient-based one however.

EDIT: I've just run the same code, this time with the raw data instead of the scaled data, and it seems to work fine, so I'm guessing that was the cause. However, for the sake of standardising across different methods, I want to use my preprocessed, z-scored data. Is there a way of doing this? Thanks again!

amit-sharma commented 3 years ago

It's unclear why CF generation is successful for the raw data but not the z-scored ones. Is it possible to share a minimum working example so we can debug? Perhaps a simulated data close to yours?

cwayad commented 1 year ago

Hi, I get the same problem too. Here is a synthetic data I am generating :

Synthetic data

SIZE=100 loc=0.0 scale= 0.5 n, p = 10, .5 a=[0, 1, 2] np.random.seed(seed=0)

x1=np.random.randint(2,size=SIZE) x2=np.random.randint(2,size=SIZE) x3=np.random.normal(loc=loc, scale=scale, size=SIZE) x4= np.random.choice(a=a, size=SIZE) x5= (np.logical_xor(x1, x2)).astype(int) x6= (np.logical_not(x2)).astype(int) x7= np.random.binomial(n, p, size=SIZE) x8= np.sin(x7/2.)

y=x1-x2+x3-x4+x5-x6+x7-x8

df=pd.DataFrame() df['x1']=x1 df['x2']=x2 df['x3']=x3 df['x4']=x4 df['x5']=x5 df['x6']=x6 df['x7']=x7 df['x8']=x8 cut=np.mean(y) df['y']=np.where(y>cut,1,0)

DiCE

  d = dice_ml.Data(dataframe=df, continuous_features=list(x_test.columns), outcome_name=output)
  m = dice_ml.Model(model=model, backend="sklearn")

  exp = Dice(d, m, method="random")
  query_instance = x_test
  e1 = exp.generate_counterfactuals(query_instance, total_CFs=10, desired_range=None,
                                    desired_class="opposite",
                                    permitted_range=None, features_to_vary="all")

  imp = exp.local_feature_importance(query_instance, posthoc_sparsity_param=None)
  dicecontrib=pd.DataFrame.from_dict(imp.local_importance)

I am running this code for different values of loc, scale and p.

loc = arange(0.0, 5.0, 0.1) scale = arange(0.0, 1.0, 0.1) p = arange(0.0, 1.0, 0.1)

amit-sharma commented 1 year ago

@cwayad are you using the latest version of dice (v0.9)?

also, to reproduce your example, I need the model training code. can you provide that so that this can be debugged?

cwayad commented 1 year ago

@amit-sharma Yes I am using the latest version (v0.9). I use a random forest model: x_train, y_train, x_test, y_test= train_test_split(df, feature_to_predict, n=0.8, random_s=25) model = RandomForestClassifier(n_estimators = 10) model.fit(x_train, y_train)

Please note that I run this code in three for loops on loc, scale and p. so it may work for some combination of loc, scale and p and not for others.

I already test it with some datasets like "cervical cancer" and it worked but it doesn't work for this specific synthetic dataset.

PMK1991 commented 1 year ago

Hello @amit-sharma . This issue still persists. DiCE gets stuck infinitely for some instances with Genetic Method, or not counterfactuals are found for other methods like Random or KD-Tree. Is there some way which can determine which instances can get stuck this way? Thank you.

baji-loreal commented 1 year ago

Hello Amit,

Even i am facing similar issue in latest version 0.10. too.Is there any solution for this?.