parrt / dtreeviz

A python library for decision tree visualization and model interpretation.
MIT License
2.89k stars 333 forks source link

TypeError when trying to visualize a tree #271

Closed mshqn closed 1 year ago

mshqn commented 1 year ago

Hi everyone! dtreeviz works perfecly well on my computer for example on iris dataset, but with my own dataset I am getting the error: TypeError: '<=' not supported between instances of 'numpy.ndarray' and 'str'

clf = tree.DecisionTreeClassifier(max_depth = 2)
clf.fit(encoded_feature_set_1, target_1)
viz_model = dtreeviz.model(clf_tree,
   X_train=encoded_feature_set_1, y_train=target_1,
   feature_names=feature_namess,
   target_name='is progressor',
   class_names=['no', 'yes'])
v = viz_model.view(fancy=False)

image

I created separately a shadow tree:

tree_obj = dtreeviz.models.shadow_decision_tree.ShadowDecTree.get_shadow_tree(clf_tree,
   X_train=encoded_feature_set_1, y_train=target_1,
   feature_names=feature_namess,
   target_name='is progressor',
   class_names=[0, 1])

And checked functions nclasses() and classes():

image So it should be at least str and int and I can't understand why the error says I have 'numpy.ndarray' and 'str'. I tried dtreeviz a day before and also had the same error but it helped to indicate fancy=False. Now it doesn't help. I would be grateful if anyone has ideas on what is wrong with my data that it doesn't fit dtreeviz. I don't have any problems with it while training.

parrt commented 1 year ago

hi. which RF library?

parrt commented 1 year ago

Can you verify that all of your values are indeed values and not multiple values (in each Data frame cell)? Thanks, Ter

On Fri, Feb 24, 2023 at 9:10 AM mshqn @.***> wrote:

Hi everyone! dtreeviz works perfecly well on my computer for example on iris dataset, but with my own dataset I am getting the error: TypeError: '<=' not supported between instances of 'numpy.ndarray' and 'str'

clf = tree.DecisionTreeClassifier(max_depth = 2) clf.fit(encoded_feature_set_1, target_1) viz_model = dtreeviz.model(clf_tree, X_train=encoded_feature_set_1, y_train=target_1, feature_names=feature_namess, target_name='is progressor', class_names=['no', 'yes']) v = viz_model.view(fancy=False)

[image: image] https://user-images.githubusercontent.com/58361053/221239466-43a30435-63f2-49a5-9102-cb49ab561f4b.png

I created separately a shadow tree:

tree_obj = dtreeviz.models.shadow_decision_tree.ShadowDecTree.get_shadow_tree(clf_tree, X_train=encoded_feature_set_1, y_train=target_1, feature_names=feature_namess, target_name='is progressor', class_names=[0, 1])

And checked functions nclasses() and classes():

[image: image] https://user-images.githubusercontent.com/58361053/221241098-fd1a9d42-97e3-4bd8-bab4-ec3f1ed3fdd1.png So it should be at least str and int and I can't understand why the error says I have 'numpy.ndarray' and 'str'. I tried dtreeviz a day before and also had the same error but it helped to indicate fancy=False. Now it doesn't help. I would be grateful if anyone has ideas on what is wrong with my data that it doesn't fit dtreeviz. I don't have any problems with it while training.

— Reply to this email directly, view it on GitHub https://github.com/parrt/dtreeviz/issues/271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABLUWKCOAKMHGMG7M2N77DWZDTONANCNFSM6AAAAAAVHEHID4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Dictation in use. Please excuse homophones, malapropisms, and nonsense.

tlapusan commented 1 year ago

@mshqn please check that your target column is numeric/int :).

dtreeviz makes the assumption that the target variable is already encoded to int.

Hope to solve the issue :)

mshqn commented 1 year ago

@tlapusan Thanks a lot, this helped.

tlapusan commented 1 year ago

@parrt I will create an issue ticket. It would be necessary to raise an exception/message to the user in case the target variable is a string.

@mshqn thanks for creating this issue.