interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.
https://interpretml.github.io/DiCE/
MIT License
1.34k stars 185 forks source link

('Feature', ... , 'has a value outside the dataset.') caused by type mismatch #390

Open dylan-kelahealth opened 1 year ago

dylan-kelahealth commented 1 year ago

[problem]

Value errors for out of interval values are thrown when data are of different type

[illustrative example]

...
        data_ranges = df.describe().loc[["min", "max"]].to_dict()
        data_ranges = {
            x: [y, z] # integer values
            for x, y, z in zip(
                df.columns, df.min(), df.max()
            )
        }
...
        d = dice_ml.Data(
            features=data_ranges,
            continuous_features=continuous,
            outcome_name=outcome,
        )
        exp = dice_ml.Dice(d, model, method="random")
        counterfactuals = exp.generate_counterfactuals(query_df, total_CFs=4, desired_class="opposite")

The following error is produced when the values of the intervals are int, and the query values are some precision of float

('Feature', ... , 'has a value outside the dataset.')

However, the feature value can be within the range of the interval and still throw this error.

After casting the feature ranges to the same type as their query values, this error goes away.

[proposed fix] Intervals should not be required to have the same type as the query values

interval: list[int] =  [1, 10]

should accept value,

value: np.float16 = 3.5

as within the range of the interval.

gaugup commented 1 year ago

@dylan-kelahealth, The error looks to me correctly raised. Your model would have been trained on integer features so not sure how it would interpret non-integer values if dice-ml decides to generate floating point values. Correct me if you think that is not accurate.

dylan-kelahealth commented 1 year ago

@gaugup If this is a correctly raised error, then the error message might benefit from revision (e.g. type checking). Several other issues have posted about the same error message. This is also not clarified in the documentation.

This was confusing because "value outside the dataset" implies outside the defined range. The range contains the query value unless the range is specified as integers only, which it is not.

gaugup commented 1 year ago

Thanks for clarifying. I think we should raise a better error message in this case to ask user to align types.

Could you provide the exact notebook code that reproduces this error?

Thanks!