Plot confusion matrix not from predictions but from actual confusion matrix

reiinakano / scikit-plot

An intuitive library to add plotting functionality to scikit-learn objects.

MIT License

2.43k stars 285 forks source link

Hi,

I would like to ask if there is a way to provide a precomputed confusion matrix and still using scikit-plot functions for visualization. I have a task where I want to plot 2 types of confusion matrix: one for number of transactions and one for the amount of each transaction ($). In the first case is pretty straightforward, I have ground truth, I have predictions, so just a quick call to plot_confusion_matrix and voilá. However, for the second case is not that easy, as some transactions could be in order of 1000$. If the dataset is of millions of dolars, I would need to create an array with a huge size where each element is a single $, its prediction and its ground truth. It is less cumbersome if I compute by myself the confusion matrix and plot it with a seaborn.heatmap but then the appearance will not be consistent with the other plots.

Is this something that can be done? or maybe is it an enhancement suggestion?

Thanks

What prompts you to represent a continuous outcome/prediction ($ amount) in terms of a confusion matrix (meant for binary or categorical modeling tasks)? It seems to me the output of a confusion matrix with even tens of different categories represented would be difficult to understand, let alone potentially thousands of categories.

I assume you're trying to understand your model's performance across the entire dollar range, to see where there may be gaps. Have you tried a residual plot (i.e. plotting predicted $ amount on the x-axis, and the error on the y-axis?

I suppose you could try binning your $ amounts to reduce the cardinality in the predictions/actual outcomes but that seems arbitrary and roundabout.

reiinakano / scikit-plot

Plot confusion matrix not from predictions but from actual confusion matrix #105