interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.
https://interpretml.github.io/DiCE/
MIT License
1.3k stars 183 forks source link

Permitted range #426

Open GoMinh opened 5 months ago

GoMinh commented 5 months ago

I am using DiCE for my YouTube data. The variable of interest is view count, which is a continuous variable. Since my variable of interest is continuous, I used RandomForestRegressor() rather than RandomForestClassifier() as the case for binary or multiclass classification problem.

I share some key lines of code here.

d = dice_ml.Data(dataframe=df,
                 continuous_features=['video_duration','word_count','word_unique'],
                 outcome_name='viewCount') 

m = dice_ml.Model(model=model, backend='sklearn', model_type='regressor') 

exp = dice_ml.Dice(d, m, method='random')

e1 = exp.generate_counterfactuals(X_test[0:1],
                                  total_CFs=5,
                                  desired_range=[100000,200000], 
                                  permitted_range={'word_unique': [30,40], 'video_duration': [200,300]})
e1.visualize_as_dataframe(show_only_changes=False)

The issue is that I got most outputs outside the permitted_range. Most (sometimes all) 'word_unique' and 'video_duration' do not belong to [30,40], and [200,300], respectively. I tried many times with many other ranges but I faced the same issue.

I also would like to add 'word_count' together with 'word_unique' to permitted_range so that 'word_unique' is always smaller than or equal to 'word_count'. Can I set up a condition like 'word_unique' <= 'word_count' within permitted_range? (alternative solution is that, if permitted_range works correctly, then I can assign values of 'word_unique' <= values of 'word_count')

I see a similar issue raised before https://github.com/interpretml/DiCE/issues/284 but it seems this issue wasn't solved yet.

Is this a bug? Can someone help to check this issue? Thanks!