reiinakano / scikit-plot

An intuitive library to add plotting functionality to scikit-learn objects.
MIT License
2.43k stars 285 forks source link

Plot confusion matrix not from predictions but from actual confusion matrix #105

Open ricgu8086 opened 4 years ago

ricgu8086 commented 4 years ago

Hi,

I would like to ask if there is a way to provide a precomputed confusion matrix and still using scikit-plot functions for visualization. I have a task where I want to plot 2 types of confusion matrix: one for number of transactions and one for the amount of each transaction ($). In the first case is pretty straightforward, I have ground truth, I have predictions, so just a quick call to plot_confusion_matrix and voilá. However, for the second case is not that easy, as some transactions could be in order of 1000$. If the dataset is of millions of dolars, I would need to create an array with a huge size where each element is a single $, its prediction and its ground truth. It is less cumbersome if I compute by myself the confusion matrix and plot it with a seaborn.heatmap but then the appearance will not be consistent with the other plots.

Is this something that can be done? or maybe is it an enhancement suggestion?

Thanks

jake-mason commented 4 years ago

What prompts you to represent a continuous outcome/prediction ($ amount) in terms of a confusion matrix (meant for binary or categorical modeling tasks)? It seems to me the output of a confusion matrix with even tens of different categories represented would be difficult to understand, let alone potentially thousands of categories.

I assume you're trying to understand your model's performance across the entire dollar range, to see where there may be gaps. Have you tried a residual plot (i.e. plotting predicted $ amount on the x-axis, and the error on the y-axis?

I suppose you could try binning your $ amounts to reduce the cardinality in the predictions/actual outcomes but that seems arbitrary and roundabout.