interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.
https://interpretml.github.io/DiCE/
MIT License
1.35k stars 187 forks source link

How to early stop? The regression task runs without results when the expected output is too high #298

Open xueyagaga opened 2 years ago

xueyagaga commented 2 years ago

The output range of my regression prediction is [1, 30], while most targets are lower than 2. When I generate counterfactuals with an expected output of 10 or more for a particular instance, the _generate_Counterfactuals function runs for a long time without producing a result.
It normally takes only a few seconds to generate a result (expected counterfactual output in [1,5]).
How can I make the
generate_Counterfactuals_ function stop automatically if it's hard to find counterfactuals (similar to neural network training)

xueyagaga commented 2 years ago

The regression prediction target feature is severely left-skewed (n=244217, mean=1.11, std=0.546). The histogram of the distribution is shown below: image

Here is my code: When the desired output is in the range of [1, 5], it only takes a few seconds to run _generate_Counterfactuals_

CF_genetic = CF_DICE.generate_counterfactuals(query_instances,
                                              total_CFs=15,
                                              desired_range=[1.0, 5.0],
                                             features_to_vary=[IV_vary,
                                                               MV_vary])
-------> 32%|███▏      | 639/2000 [1:20:01<2:29:34,  6.59s/it]

But when the desiredrange is set to a higher range, such as [10, 15], the generate_Counterfactuals_ keep running for hours, but no results

CF_genetic = CF_DICE.generate_counterfactuals(query_instances,
                                              total_CFs=15,
                                              desired_range=[10.0, 15.0],
                                             features_to_vary=[IV_vary,
                                                               MV_vary])
-------> Keep running and no results
gaugup commented 2 years ago

@xueyagaga maybe your model cannot give prediction between range [10, 15]. That's why perhaps the dice explainer is trying to generate lot more points to arrive at some counterfactual. But nevertheless the explainer should stop trying to find counterfactuals after a reasonable tries. How do you setup the dice-ml explainer?

xueyagaga commented 2 years ago

@xueyagaga maybe your model cannot give prediction between range [10, 15]. That's why perhaps the dice explainer is trying to generate lot more points to arrive at some counterfactual. But nevertheless the explainer should stop trying to find counterfactuals after a reasonable tries. How do you setup the dice-ml explainer?

Thank u for the response! The DiCE is really great! The code for setting dice-ml explainer is as below:

### The trained ML prediction model is RandomForestRegressor
d = dice_ml.Data(dataframe=dataset, continuous_features=continuous_features_housing,
                         outcome_name=outcome)
m = dice_ml.Model(model=model, backend="sklearn", model_type='regressor') 
CF_DICE = dice_ml.Dice(d, m, method="genetic") 

I'm hesitating whether I've missed some parameter settings so that the explainer won't stop.