I am using DiCE for my YouTube data. The variable of interest is view count, which is a continuous variable. Since my variable of interest is continuous, I used RandomForestRegressor() rather than RandomForestClassifier() as the case for binary or multiclass classification problem.
I share some key lines of code here.
d = dice_ml.Data(dataframe=df,
continuous_features=['video_duration','word_count','word_unique'],
outcome_name='viewCount')
m = dice_ml.Model(model=model, backend='sklearn', model_type='regressor')
exp = dice_ml.Dice(d, m, method='random')
e1 = exp.generate_counterfactuals(X_test[0:1],
total_CFs=5,
desired_range=[100000,200000],
permitted_range={'word_unique': [30,40], 'video_duration': [200,300]})
e1.visualize_as_dataframe(show_only_changes=False)
The issue is that I got most outputs outside the permitted_range. Most (sometimes all) 'word_unique' and 'video_duration' do not belong to [30,40], and [200,300], respectively. I tried many times with many other ranges but I faced the same issue.
I also would like to add 'word_count' together with 'word_unique' to permitted_range so that 'word_unique' is always smaller than or equal to 'word_count'. Can I set up a condition like 'word_unique' <= 'word_count' within permitted_range? (alternative solution is that, if permitted_range works correctly, then I can assign values of 'word_unique' <= values of 'word_count')
I am using DiCE for my YouTube data. The variable of interest is view count, which is a continuous variable. Since my variable of interest is continuous, I used RandomForestRegressor() rather than RandomForestClassifier() as the case for binary or multiclass classification problem.
I share some key lines of code here.
The issue is that I got most outputs outside the
permitted_range
. Most (sometimes all) 'word_unique' and 'video_duration' do not belong to [30,40], and [200,300], respectively. I tried many times with many other ranges but I faced the same issue.I also would like to add 'word_count' together with 'word_unique' to
permitted_range
so that 'word_unique' is always smaller than or equal to 'word_count'. Can I set up a condition like 'word_unique' <= 'word_count' withinpermitted_range
? (alternative solution is that, ifpermitted_range
works correctly, then I can assign values of 'word_unique' <= values of 'word_count')I see a similar issue raised before https://github.com/interpretml/DiCE/issues/284 but it seems this issue wasn't solved yet.
Is this a bug? Can someone help to check this issue? Thanks!