reiinakano / scikit-plot

An intuitive library to add plotting functionality to scikit-learn objects.
MIT License
2.43k stars 284 forks source link

Plot precision-recall curve for support vector machine classifier #87

Closed foo123 closed 6 years ago

foo123 commented 6 years ago

Hello I want to plot a precision-recall curve for SVC (support vector machine classifier), but the scikit-learn svm classifier does not implement a predict_proba method. How can I do that in scikit-plot (as far as I can see in the documentation it accepts prediction probabilities to plot the curve)?

Note that the scikit-learn documentation page has an example of precision-recall curve for SVC

Thank you, Nikos

reiinakano commented 6 years ago

Have you tried decision_function?

foo123 commented 6 years ago

@reiinakano what do you mean, can you give an example?

lugq1990 commented 6 years ago

@foo123 For SVC in SKlearn, if you want to get the probability of each class, before you fit your model, you have to set SVC's parameter "probability" to True, by default it's False. After setting that, you train your model, and you can use SVC to predict_proba() method to get probability of each test dataset. So you can use the plot_precision_recall to get it. Try it.

foo123 commented 6 years ago

@lugq1990 I use LinearSVC as well and this does not have a parameter probability. So I would need a more generic method. If using decision_function is an option I would like to learn how to use it but as far as I can tell from the documentation it returns confidence scores based on signed distance from hyperplane and not probabilities. Furthermore using the probability parameter in SVC how am I supposed to access the probabilities from predict method? It does not have a predict_proba method. But I need predict method to return actual classes since I also plot the confusion matrix between predicted and actual classes. I am a little lost here any help or working example is highly appreciated. Thanks

lugq1990 commented 6 years ago

@foo123 For SVM, there is the parameter probability, so you can use it to predict the probability and prediction both. But for LinearSVC in sklearn is not supported for predict_proba. But there is two ways to do this, one is use the basic SVC using the linear kernel, another is use the Probability calibration, it is a more general solution for estimator not supports predict_proba. Here is the link for it:http://scikit-learn.org/stable/modules/calibration.html.

I write some basic code for you:

`from sklearn.datasets import load_iris from sklearn.svm import SVC, LinearSVC from sklearn.calibration import CalibratedClassifierCV import matplotlib.pyplot as plt import scikitplot as skplt

iris = load_iris() x, y = iris.data, iris.target "if you want to get the probability, set probability to be True" clf = SVC(probability=True) clf.fit(x, y) pred = clf.predict(x) prob = clf.predict_proba(x) """you can plot the precision_recall_curve and confusion matrix""" skplt.metrics.plot_precision_recall_curve(y, prob) skplt.metrics.plot_confusion_matrix(y, pred) plt.show()

""" Here is for LinearSVC first way is to set the kernel of SVM to be 'linear' to replace the LinearSVC """ clf = SVC(probability=True, kernel='linear') clf.fit(x, y) "others is same"

"""second way is use the CalibratedClassifierCV for more generally useful for that not support predict_proba""" svm = LinearSVC() clf_cal = CalibratedClassifierCV(svm) clf_cal.fit(x, y) pred = clf_cal.predict(x) prob = clf_cal.predict_proba(x)`

foo123 commented 6 years ago

@lugq1990 Thanks, I use both SVC(linear) and LinearSVC() as they provide for different solutions with subtle differences. I dont understand how the documentation for SVC does not mention the predict_proba method even exists, but I tested it and it works. As far as calibrated probabilities are concerned, i keep it in mind but i use the train data as test data as well (in some cases) and it is mentioned that the data for calibrated probabilities should not be same as train data. So I do not use this method for LinearSVC(). Instead i used a combination of the precision-recall example from sklearn here and code from scikit-plot to make a precision-recall curve based on decision_function method. Closing this