interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.
https://interpretml.github.io/DiCE/
MIT License
1.31k stars 185 forks source link

Robustness of generated CFs #206

Open Saladino93 opened 2 years ago

Saladino93 commented 2 years ago

I noticed that running several times the generation of CFs, that is basically an optimization, gives me different counterfactuals for different runs.... Should I tune the parameters of the optimization or is this expected?

Also the same happens if I generate a different number of CFs (supposing it is not from the above problem).

Finally, suppose I generate 20 CFs.... How am I going to use this information in the real world? It is still not clear to me how this could benefit one taking action, apart of a better understanding of the model for the machine learning expert.

Thanks!

Saladino93 commented 2 years ago

Ok, it seems to happen only with random, and setting random_seed solves.

Btw, I see that genetic algo sometimes does not give a new cfs output.

amit-sharma commented 2 years ago

that makes sense. Didn't understand your comment on genetic algorithm---it fails to provide any output?

On your other question, the CFs are useful for algorithmic recourse in the real world. Consider a model that provided a decision to a user and the user wants to know how to change their features to obtain the desired outcome. Then they can explore these counterfactual examples to see the viable changes that can change their predicted outcome.

Saladino93 commented 2 years ago

@amit-sharma , basically what happens is that I have something like this

Screenshot 2021-08-11 at 12 10 17

, and so it seems to not give me new counterfactuals for the output, even if it is perturbing the inputs.

Regarding your comment, thanks a lot. Yes, given that a user knows that the model is fixed then for their decisions this is important. I was just thinking about all the possible ways in which one might act in the real world, and got confused with the real cause of the output, while here it is just exploring a fixed model. Thanks!

p.s. Sorry for the another off-topic questions, but sometimes I get WARNING - MAD for feature X is 0, so replacing it with 1.0 to avoid error.. I know this is used to renormalize continuous variables, so it should not be a problem, am I right? I am asking because then I get many zeros in the counterfactuals. (I understand is a bit vague the questions, so I might try to give more details...)

Also, do you know how the cf generation scales with N of features and Nrows of training?

Saladino93 commented 2 years ago

p.p.s

Would you be interested in some filtering code, and some simple reason generator? I do not know which checks should be done, but I am still testing these.