ploomber / sklearn-evaluation

Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
https://sklearn-evaluation.ploomber.io
Apache License 2.0
455 stars 54 forks source link

Interactive confusion matrix #275

Closed neelasha23 closed 1 year ago

neelasha23 commented 1 year ago

Describe your changes

Interactive confusion matrix plots.

Issue ticket number and link

Closes #185

Checklist before requesting a review


:books: Documentation preview :books:: https://sklearn-evaluation--275.org.readthedocs.build/en/275/

coveralls commented 1 year ago

Pull Request Test Coverage Report for Build 4263006367

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report. To ensure accuracy in future PRs, please see these guidelines. A quick fix for this PR: rebase it; your next report should be accurate.


Changes Missing Coverage Covered Lines Changed/Added Lines %
src/sklearn_evaluation/plot/confusion_matrix_interactive.py 173 186 93.01%
<!-- Total: 174 187 93.05% -->
Totals Coverage Status
Change from base Build 4236644410: -1.1%
Covered Lines: 3222
Relevant Lines: 3429

💛 - Coveralls
neelasha23 commented 1 year ago

Few clarifications before proceeding with the code:

Currently I have added the code as a user guide. (Will move this code to the ConfusionMatrix class once there's more clarity. Right now if we hover over the quadrants it displays a Min metric (dummy values).

  1. we could show some statistics (range, min, max, percentiles) for the data points in such quadrant : How do we calculate these values? The plots are of actual labels vs predicted labels. Do we need to calculate statistics on the test dataset features?
  2. Can we add a parameter interactive=True/False to the from_raw_data . This will call the interactive version of the plot.

@edublancas

edublancas commented 1 year ago

Great initial work!

we could show some statistics (range, min, max, percentiles) for the data points in such quadrant : How do we calculate these values? The plots are of actual labels vs predicted labels. Do we need to calculate statistics on the test dataset features?

good point. in this case, we'd require the user to also pass the features, thinking of something like this:

ConfusionMatrix.interactive_from_raw_data(y_actual, y_pred, X)

Can we add a parameter interactive=True/False to the from_raw_data . This will call the interactive version of the plot.

Unsure about this, let's keep working on the POC and decide the best API.


some ideas for the next iteration:

Does Vega allow displaying a second plot/table next to the confusion matrix and update it when clicking on a quadrant?

two use cases come to mind:

  1. display a sample of rows that appear in the clicked quadrant
  2. display a histogram/table summarizing the features for all rows that belong to the clicked quadrant

take a look at Vega/Altair and let me know if this sounds feasible!

neelasha23 commented 1 year ago

Have added table next to the plot. The values change depend on which quadrant is clicked. It has some initial values which I'm looking at removing. @edublancas

edublancas commented 1 year ago

great! I think let's implement this as InteractiveConfusionMatrix with the same structure as AbstractPlot (there will be some minor differences since we no longer have the ax and figure attributes, but most of the structure will be the same.

The signature can be something like:

def from_raw_data(cls, y_true, y_pred, target_names=None, normalize=False, X_test=None):
    ....

The key difference here is X_test. If None, we just display the matrix, but if not None, we display two tables below the matrix:

  1. randomly samples 3-5 observations that correspond to the clicked quadrant
  2. computes some summary statistics from the columns

The current function we use for the other confusion matrix returns us the matrix directly, so we'll need to implement some logic to determine for each observation, which quadrant it belongs to.

We can assume that X_test will be a data frame, we can later decide what we do with numpy arrays (since they don't have column labels, it doesn't make much sense to display them in a table)

neelasha23 commented 1 year ago

have added the first version of metrics here. Still figuring out the histogram part and what data to exactly show there. Working on the formatting issues as well. Currently the first table shows first 5 samples, will replace it with random sampling.

We can assume that X_test will be a data frame, we can later decide what we do with numpy arrays (since they don't have column labels, it doesn't make much sense to display them in a table)

In this case we can mandate the user to pass column names as well, and then internally convert numpy to dataframe.

@edublancas

edublancas commented 1 year ago

for the sample rows: can we display them as a pandas-likes table? the numbers with no column labels is a bit hard to read

Still figuring out the histogram part and what data to exactly show there.

I was thinking of simple summary statistics, like the ones pandas.describe shows

In this case we can mandate the user to pass column names as well, and then internally convert numpy to dataframe.

Ok, so it can be:

from_raw_data(cls, y_true, y_pred, target_names=None, normalize=False, X_test=None, feature_names=None):

if X_test is a data frame, we grab them from there, if it's a numpy array, we require feature_names (or maybe if feature_names is None, we display feature 0, feature 1,...)

neelasha23 commented 1 year ago

for the sample rows: can we display them as a pandas-likes table? the numbers with no column labels is a bit hard to read

column labels are present (currently it shows Feature N_sampled, I'll change it to just the feature N). I have removed the spacing between columns. This is the tutorial I followed. There is no support of tables in altair so this is a workaround . Refer Github issue

I was thinking of simple summary statistics, like the ones pandas.describe shows

Added mean, min, max for now, will add more if this looks fine.

@edublancas

edublancas commented 1 year ago

column labels are present (currently it shows Feature N_sampled

I see! My bad. I thought it was one observation per box, but I see each observation is one row in those boxes. looks good!

Added mean, min, max for now, will add more if this looks fine.

looks good! let's just format it so we only show 6-7, this look useful:

https://numpy.org/doc/stable/reference/generated/numpy.format_float_positional.html https://numpy.org/doc/stable/reference/generated/numpy.format_float_scientific.html#numpy.format_float_scientific

although there doesn't seem to be a way to automatically choose scientific notation for larger numbers, something that pandas does

neelasha23 commented 1 year ago

Have addressed the review suggestions. A few doubts:

  1. Altair charts rendering is well integrated with jupyter notebook but not when we run a python script. Tried the altair_viewer method mentioned here, but it opens the chart in a new browser window during the docs build. So in the docstring of the from_raw_method the chart is being displayed using .. raw:: html directive. So everytime the code changes the new chart after running the example needs to be copied and saved in src/sklearn-evaluation/assets/cm/ path as metric_chart.html. I'm not sure if this is the correct way as it's a manual process, but I'm still not sure how else to handle this.
  2. If X_test contains too many columns the view would become very wide, in this case should users pass only the desired columns as a dataframe/ numpy array. Or should this be handled from our end?

@edublancas

neelasha23 commented 1 year ago

Did you get a chance to review this? @edublancas

edublancas commented 1 year ago

sorry for the delay

Altair charts rendering is well integrated with jupyter notebook but not when we run a python script. Tried the altair_viewer method mentioned here, but it opens the chart in a new browser window during the docs build. So in the docstring of the from_raw_method the chart is being displayed using .. raw:: html directive. So everytime the code changes the new chart after running the example needs to be copied and saved in src/sklearn-evaluation/assets/cm/ path as metric_chart.html. I'm not sure if this is the correct way as it's a manual process, but I'm still not sure how else to handle this.

let's ignore this for now and just include a link from the docstring to a notebook tutorial so it correctly displays the interactive plot. we can create a new one under classification/interactive-plots

edublancas commented 1 year ago

If X_test contains too many columns the view would become very wide, in this case should users pass only the desired columns as a dataframe/ numpy array. Or should this be handled from our end?

if there are more than X columns, I think let's just grab the top Y (not sure how many make sense, maybe 4-5?) and display a warning saying that only the top X columns are displayed and that users might subselect fewer columns if they want to control which ones to display

neelasha23 commented 1 year ago

Fixed the above points @edublancas

tonykploomber commented 1 year ago

In the preview doc https://sklearn-evaluation--275.org.readthedocs.build/en/275/classification/cm_interactive.html

It seems the Observation and Statistics number are overlapping

Screenshot 2023-02-23 at 11 34 04 AM
neelasha23 commented 1 year ago

In the preview doc https://sklearn-evaluation--275.org.readthedocs.build/en/275/classification/cm_interactive.html

It seems the Observation and Statistics number are overlapping

Screenshot 2023-02-23 at 11 34 04 AM

Yes this is happening if you click on the table area. In this case all the quadrants are getting selected all at once.

tonykploomber commented 1 year ago

I also checked the altair doc Seems there is no easy way to fix the issue when multiple selected mode.

We can check if there is better way to present table with multiple values

edublancas commented 1 year ago

is there a way to get back the original colors? and to highlight the selected one, maybe we can add a border or something else so it's clear which one is selected. I think with that we'd be ready to release!

looks like this:

image

although when double-clicking a quadrant, it gets the right colors (but triggers the overlap problem):

image

one thing that came to mind (let's see if it's not too much of a hassle): when testing the confusion matrix, I realized that it'd be useful to compare quadrants. for example, I might want to know how false positives for class 1 compare to false positives for class 2 by looking at the sample statistics. is there a way to allow a user to select two quadrants? for example, the first one with and a second one with a double-click or some other modifies like shift + click

then if two quadrants are selected, we could display the ratio of features: feature_1_from_quadrant_1 / feature_1_from_quadrant_2

neelasha23 commented 1 year ago

is there a way to get back the original colors? and to highlight the selected one, maybe we can add a border or something else so it's clear which one is selected. I think with that we'd be ready to release!

I don't see a border attribute in encodings There is opacity, I tried it along with conditional colors but it's messing up the color. The colors are currently selected based on a quadrant click like

color = alt.condition(
            selection,
            "confusion_matrix:N",
            alt.value("lightgray"),
            scale=alt.Scale(scheme="oranges"),
        )

This condition has a format (event_selection, color_if_true, color_if_false)

The only thing I can think of is changing the lightgray to some other color but that doesn't seem to solve the purpose.

when testing the confusion matrix, I realized that it'd be useful to compare quadrants.

There is a way to select multiple quadrants using selection_multi (Shift button) but it is displaying an overlapping chart (similar when we select all quadrants) I think the we would somehow need to capture the event single quadrant clicked and double quadrant clicked in our script .

if single_quadrant_clicked:
  display tables
else if double_quadrant_clicked:
  display feature ratios

According to this thread and this one there doesn't seem to be support for callbacks of Vega events from altair in Python.

I'm not sure how to go about this. Would need some more time to experiment

@edublancas

idomic commented 1 year ago

@neelasha23 anything else pending here?

neelasha23 commented 1 year ago

This is done, I have resolved all the comments. @idomic

edublancas commented 1 year ago

Woohoo! 🎉

idomic commented 1 year ago

Nice job @neelasha23 !