parrt / dtreeviz

A python library for decision tree visualization and model interpretation.
MIT License
2.89k stars 333 forks source link

Show leaf values, i.e. leaf weights, for classification trees #239

Open mepland opened 1 year ago

mepland commented 1 year ago

Instead of printing the argmax predicted class name at each leaf for classification trees, allow the user to show the numeric value, i.e. weight, of the leaf as is done for regression trees. We may want to retain the current argmax class name behavior as an option for the user.

Somewhat related to https://github.com/parrt/dtreeviz/issues/178

Current relevant code: trees.py

    prediction = node.prediction_name()

    if leaftype == 'pie':
        _draw_piechart(counts, size=size, colors=colors, filename=filename, label=f"n={nsamples}\n{prediction}",
                      graph_colors=graph_colors, fontname=fontname)
    elif leaftype == 'barh':
        _draw_barh_chart(counts, size=size, colors=colors, filename=filename, label=f"n={nsamples}\n{prediction}",
                      graph_colors=graph_colors, fontname=fontname)

For a get_prediction() example, see the sklearn_decision_trees.py implementation:

    def get_prediction(self, id):
        if self.is_classifier():
            counts = self.tree_model.tree_.value[id][0]
            return np.argmax(counts)
        else:
            return self.tree_model.tree_.value[id][0][0]
mepland commented 1 year ago

Also discussed here.

parrt commented 1 year ago

yeah, let's see what @tlapusan thanks about creating a special function for classifiers, depending on the decision tree library, that returns a value to display.

tlapusan commented 1 year ago

The most important information of a leaf to display is the predicted class and after that the probability of the predictions, which shows the confidence of the predicted class. So IMO, we can add an option to display the probability, but not making it the default one. Indirectly... the user can deduce the probability of the predicted class by looking at the leaf pie chart...

All the dtreeviz visualisations were created to interprete trees which are independent (not interconnected), like a tree from a random forest... Indeed, xgboost is a little different and we can make some adjustments for it.

I'm in vacation this week, but I will thing about it while skiing ⛷️ .

mepland commented 1 year ago

So IMO, we can add an option to display the probability, but not making it the default one.

Totally happy to have the class name remain the default behavior. I would just like to extend it to also be able to show the leaf values if the user wants to enable them.

Indirectly... the user can deduce the probability of the predicted class by looking at the leaf pie chart...

For most tree models yes, but the FIGS model of csinva/imodels does not use the leaf positive class fraction for its leaf values; instead they are the residuals of the other trees in the ensemble for the points in the leaf. Plus it is always good to have a quantitative display option, rather than trying to read the leaf graph by eye for the % positive.

parrt commented 1 year ago

Rather than a user having a specify a dictionary, I think it's better if we come up with a function that is generic across libraries that returns a value that makes sense for that library. Then there is an option to flip it to show that value.

Or, we allow lambda or function as an argument that gets applied to each leaf node to get a value.

mepland commented 1 year ago

Yeah, makes sense - that is the elegant solution. I will work on writing up an implementation for sklearn.