interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.
https://interpretml.github.io/DiCE/
MIT License
1.33k stars 185 forks source link

Why use the MAD in normalized features? #54

Closed wangyongjie-ntu closed 3 years ago

wangyongjie-ntu commented 3 years ago

The MAD is mainly for the heterogeneous features(different features have different scales, ranges). If you normalize the features with a min-max scaler. all features are mapped into [0,1].

In the adult example, the data interface normalizes the features, why does the defaulted setting is "inverse_mad"? From my understanding, the l2 distance is good. The paper "COUNTERFACTUAL EXPLANATIONS WITHOUT OPENING THE BLACK BOX: AUTOMATED DECISIONS AND THE GDPR" P18, Equation 5, the authors also suggest the l2 distance.

Do you find any differences between these two kinds of distance ?

raam93 commented 3 years ago

You are correct in the interpretation of MAD however you missed the fact that the distribution of features after min-max scaling stays the same. Any scaling method will not change the shape of the data distribution but changes only the range. So dividing the distance by MAD (inverse_mad option) is to capture the relative prevalence of observing the feature at a particular value. Please refer to sec 3.3 "Choice of distance function." in our paper or refer to the paragraph below equation 4 in Wachter et al paper. However, I agree that you could use l2 distance scaled by standard deviation or any other variant, but we found the l1-MAD option working well for most scenarios in our experiments. Anyways, we will include more options for distance loss soon.

Meanwhile, in the Wachter et al. paper, the authors do not suggest to use l2-distance always, but instead they experiment with different variants of distance functions and even show that l1-MAD option generates sparser results (last para of LSAT data section, P20).