Closed 0ptimista closed 1 year ago
Can you send data + small program? I can debug.
Is it ok I send those to your email address on GitHub? Or is there a better way ?
You can probably attach here if they’re not too big but my email is OK as well
I tried this on Jupyter Notebook.
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from dtreeviz import decision_boundaries
import dtreeviz
data = pd.read_csv('sample.csv')
X=data.drop('stat',axis=1)
y=data['stat']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
dt_wx = DecisionTreeClassifier(max_depth=6)
dt_wx.fit(X.values, y.values)
viz = dtreeviz.model(
dt_wx,
X_train,
y_train,
feature_names=list(X_train.columns),
target_name='stat',
class_names=["OK", "Problem"],
)
viz.view(scale=1)
decision_boundaries(
dt_wx, X_train, y_train,
ntiles=40,
tile_fraction=1,
feature_names=list(X_train.columns),
target_name='stat',
class_names=["OK", "Problem"],
)
Thanks for helping Professor!
@parrt just a hint, I made a little debug on the code and the error is generated because:
I think we make an assumption that class values all start from zero, right?
@parrt I guess yes, I am not very familiar with that part of implementation.
OK @0ptimista, the issue is that class labels have to start from zero but the labels in this case are [1,2]. It must be very common to keep everything indexed from zero so for now I'm going to simply add code indicate this is an error.
You can probably do something like y=data['stat']-1
I am adding functionality to emit an error:
Traceback (most recent call last):
File "/Users/parrt/github/dtreeviz/t2.py", line 24, in <module>
viz.view(scale=1)
File "/Users/parrt/github/dtreeviz/dtreeviz/trees.py", line 478, in view
raise ValueError("Target label values (for now) must be 0..n-1 for n labels")
ValueError: Target label values (for now) must be 0..n-1 for n labels
@parrt @tlapusan I tried to set my class from 0 as sugestted, now I can see those points.
The new ValueError above it is really a good hint, and again, thanks for helping!
I have a trained DecisionTreeClassifier model with 2 features. And it is good when using dtreeviz.model() to observe the model.
But when I try decision_boundaries() It's throwing a KeyError and draw only decision boundaries without data points. I want thoses points: