parrt / dtreeviz

A python library for decision tree visualization and model interpretation.
MIT License
2.89k stars 332 forks source link

When using dataset that is different from the training for trees models - does not draw #298

Open StepanWorkV opened 1 year ago

StepanWorkV commented 1 year ago

https://github.com/parrt/dtreeviz/blob/a0e85a0d3f64ec0616dcfafb5fcf72f0cbf434b8/dtreeviz/trees.py#L1255

https://github.com/parrt/dtreeviz/blob/a0e85a0d3f64ec0616dcfafb5fcf72f0cbf434b8/dtreeviz/trees.py#L1257

Example:

Would work just fine

viz = DTreeVizAPI(
  ShadowDecTree.get_shadow_tree(
        decision_tree,
        X_train, <--- Train Data
        y_train,
        feature_names=X_train.columns.tolist(),
        target_name="target",
        class_names=labels,
    )
)

vw = viz.view(
    x=X_train.iloc[index],
    fancy=False,
    show_node_labels=False,
    show_just_path=False,
)

This would fail because of the issue described in the comment. Some nodes will not be generated, however, their file paths will be written into SVG file. When passing to graphviz later, this would error due to "file not found" for those missing plots.

viz = DTreeVizAPI(
  ShadowDecTree.get_shadow_tree(
        decision_tree,
        X_sample, <--- Some sample data that might not have all the leaves 
        y_sample,
        feature_names=X_sample.columns.tolist(),
        target_name="target",
        class_names=labels,
    )
)

vw = viz.view(
    x=X_sample.iloc[index],
    fancy=False,
    show_node_labels=False,
    show_just_path=False,
)
tlapusan commented 1 year ago

Hi @StepanWorkV, I do remember that this was solved. I have to look into the code/history