original outcome is not same as query instance.

interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.

https://interpretml.github.io/DiCE/

MIT License

1.37k stars 190 forks source link

original outcome is not same as query instance. #56

Open dayongfu opened 3 years ago

dayongfu commented 3 years ago

I have the following query instance, but in the output original outcome is not same as query instance, I assume they should be same otherwise it is useless.

dice_exp = exp.generate_counterfactuals({ 'five_star_rate': 3.5, 'nights_booked': 1.0 }, total_CFs = 4, desired_class="opposite") dice_exp.visualize_as_dataframe()

Query instance (original outcome : 1) #	five_star_rate	nights_booked	label
1	0.0	127.7	0.564298

Diverse Counterfactual set (new outcome : 0) #	nights_booked	label
1	66.2	0.215
2	66.2	0.215
3	77.0	0.280
4	62.5	0.196

gaugup commented 3 years ago

Hi @dayongfu,

The way you are trying to generate the counterfactual is by saying the desired_class="opposite".

dice_exp = exp.generate_counterfactuals({ 'five_star_rate': 3.5, 'nights_booked': 1.0 }, total_CFs = 4, desired_class="opposite")

Hence the counterfactual generated is for the scenario to flip the class for a query instance.

Does that answer your question?

Regards, Gaurav

dayongfu commented 3 years ago

thanks @gaugup !

yes, I want to generated the opposite scenarios, and I believe these scenarios are listed in the table under "Diverse Counterfactual set (new outcome : 0)". I'm curious about the table under "Query instance (original outcome : 1)", I assume it is the query instance that I input for generate_counterfactuals method, but looks so different from my inputs. why?

raam93 commented 3 years ago

May I know how did you initiate the data object, d? Make sure to feed five_star_rate and nights_booked as continuous features to the data object:

d = dice_ml.Data(dataframe=dataset, continuous_features=['five_star_rate', 'nights_booked'], outcome_name='label')

If the above did not work, could you share how your model is trained? Internally, by default, DiCE min-max normalizes the continuous features and feeds to the ML model. So if your model expects these features to be in a different format, there might be issues.

We are working on generalizing this data-transformation function so that user can specify their own methods, and will update the code shortly.

amit-sharma commented 3 years ago

@dayongfu As Ram mentioned, that bug was likely due to an encoding mismatch. We have released a new version (0.5) on PyPI that does not have this encoding issue--- generate_counterfactuals can now take input in the original data space, without having to do any encoding. Can you try your example with the new version? Hope that solves the issue.

If not, will appreciate if you can provide a small working example that we can debug.

dayongfu commented 3 years ago

thank all of you. I definitely would like to try it out and keep you posted. I also want to know whether DiCE can be used on a LSTM based classification model?

amit-sharma commented 3 years ago

Sorry missed your message @dayongfu DiCE can be used for any Pytorch/tensorflow model. Although we haven't tested it on a LSTM-based model, it should work as long as the model is differentiable.