parrt / dtreeviz

A python library for decision tree visualization and model interpretation.
MIT License
2.89k stars 332 forks source link

Decision Tree visualize wrong path #294

Open wim50594 opened 1 year ago

wim50594 commented 1 year ago

Description If the path of a DecisionTree is to be visualized for a single instance and the feature value is exactly the split value, the right path is selected. However, with Sklearn, the decision goes left for less than or equal to. In the end, an incorrect classification is displayed due to the wrong path.

To Reproduce

import dtreeviz
import numpy as np
import sklearn.tree
from sklearn.tree import DecisionTreeClassifier

X = np.expand_dims(np.arange(10), axis=1)
y = np.asarray(5*[False] + 5*[True])

clf = DecisionTreeClassifier().fit(X, y)
sklearn.tree.plot_tree(clf, class_names=["False", "True"])

sktree

viz_model = dtreeviz.model(clf, X_train=X, y_train=y, class_names=["False", "True"], feature_names='x1')
viz_model.view()

dtree

x = np.asarray([4.5])
clf.predict([x])
# outputs array([False])
x = np.asarray([4.5])
viz_model.view(x=x)

path

Expected behavior The left path should be chosen and thus the classification result "False" should also be displayed. In other words: If the split value is less than or equal to the split value, the left path should be selected.

Environment Used scikit-learn version 1.2.2 (lates)

tlapusan commented 1 year ago

Thanks @wim50594, it helps that you created the steps to reproduce the issue. I will take a look soon, right now I'm in vacation.