parrt / dtreeviz

A python library for decision tree visualization and model interpretation.
MIT License
2.89k stars 333 forks source link

Capitalization Fixes #245

Closed mepland closed 1 year ago

mepland commented 1 year ago

Also refactored missed _format_axes() in interpretation.py and classifiers.py

mepland commented 1 year ago

@parrt I don't know why these lines are included to change the x-axis ticks of the importance plots, but I think they should be removed.

Before:

    # TODO this just screws up the x-axis ticks, should just leave numeric!
    ax.set_xticks(range(0, len(shadow_tree.feature_names)))
    ax.set_xticklabels(shadow_tree.feature_names)

image

After (https://github.com/parrt/dtreeviz/pull/245/commits/73462af699cfa0f8dc6542af5e7a847d72488287): image

mepland commented 1 year ago

@parrt In https://github.com/parrt/dtreeviz/pull/245/commits/d3f5401349bb0fd812868bd05503d6fe5fd52675 the new criterion_remapping dict in utils.py is used to map the raw criterion values to nicer strings for axis labels:

criterion_remapping = {
    'gini': 'Gini',
    'entropy': 'Entropy',
    'log_loss': 'Log Loss',
    'friedman_mse': 'Friedman MSE',
    'squared_error' : 'Squared Error',
    'absolute_error': 'Absolute Error',
    'poisson' : 'Poisson',
    'variance': 'Variance',
}

I looked through the documentation for sklearn and spark trees and pulled all of the criterion values I could find. The code will fall back to the string it has if it can't find a match in the dictionary:

def criterion(self):
    return criterion_remapping.get(self.tree_model.criterion, self.tree_model.criterion)

image

image

parrt commented 1 year ago

I don't know why these lines are included to change the x-axis ticks of the importance plots, but I think they should be removed.

Agreed. I believe I recently change this to be horizontal and probably left some code which is strange but yes all of the feature names should be on the vertical axis now.

parrt commented 1 year ago

I looked through the documentation for sklearn and spark trees and pulled all of the criterion values I could find.

Excellent