Imbalanced dataset - Githubissues

linkedin / TE2Rules

Python library to explain Tree Ensemble models (TE) like XGBoost, using a rule list.

Other

40 stars 5 forks source link

rules = model_explainer.explain( X=model['preprocessor'].transform(X_train), y=y_train_pred, num_stages = 10, min_precision = 0.95 ) print(str(len(rules)) + " rules found") for i in range(len(rules)): print("Rule " + str(i) + ": " + str(rules[i])) 2 rules found Rule 0: AIDMM2 <= 0.5 Rule 1: AIDMM2 > 0.5

TE2Rules mines rules for the positive class (label = 1) as learnt by the tree ensemble model. Here are some thing to keep in mind for effectively using TE2Rules to explain the tree ensemble model:

In the above case, I'm assuming that positive class (label = 1) is the minority class. If this is not the case, can you flip the labels to make that the positive class (label = 1) as the minority class, so that the mined rules are more selective?
It seems like the rules show that the tree ensemble model predicts label = 1 for both AIDMM2 <= 0.5 and AIDMM2 > 0.5. Are your sure that your tree ensemble model is able to learn the data? The rules mined by TE2Rules is intended to explain the trained tree ensemble model and is only as good as the underlying tree ensemble model. What AUC do you observe for the underlying tree ensemble model?

Let us know if these pointers help you make the rules look better. Let us know if you suspect something else to be the issue.

linkedin / TE2Rules

Imbalanced dataset #3