google / yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
https://ydf.readthedocs.io/
Apache License 2.0
498 stars 53 forks source link

Plots not showing sparse split atrribute #128

Closed marquisthunder closed 2 months ago

marquisthunder commented 2 months ago

Dear ydf developers, Hope this get you well. I have a question about the plot_otree method not showing split condition when using oblique tree model (SPARSE_OBLIQUE)

learner = ydf.GradientBoostedTreesLearner(
   ...:   label='label',
   ...:   split_axis="SPARSE_OBLIQUE",
   ...:   sparse_oblique_normalization="MIN_MAX",
   ...:   sparse_oblique_num_projections_exponent=1.0,
   ...:   ).train(pd.concat([x_train, y_train], axis=1))
 model.get_tree(0)

Out[4]: Tree(root=NonLeaf(value=None, condition=NumericalHigherThanCondition(missing=False, score=0.0, attribute=3, threshold=95.69999694824219), pos_child=Leaf(value=RegressionValue(num_examples=0.0, value=-0.4674617052078247, standard_deviation=None)), neg_child=Leaf(value=RegressionValue(num_examples=0.0, value=0.3032335340976715, standard_deviation=None))))

The split condition is always sparse split instead of real attribute name. image

rstz commented 2 months ago

Hi, thank you for reporting this. The name of the attributes and their weights are shown as tooltips if you hover with the mouse over the three dots. Since there are (often) multiple attributes involved, the condition can be quite large and we did not want to clutter the plot too much. Does this work for you?

marquisthunder commented 2 months ago

thanks, it works well.