Closed jchen111 closed 9 years ago
Accuracy is a percentage and having negative value does not make sense in the context of the classifier performance. The range should be [0,1.] How are you computing the accuracy?
Are you using the sklearn.metrics package to compute the performance? (http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics)
Maria E. Ramirez-Loaiza Ph.D. Candidate CS Department - Machine Learning Laboratory Illinois Institute of Technology On 11/27/2014 2:40:09 PM, JIAQI CHEN notifications@github.com wrote: Dose anybody know what is a negative cross validation accuracy mean in linear regression model? We are fitting our data to sklearn linear regression model and get a negative accuracy which really make me confused. — Reply to this email directly or view it on GitHub [https://github.com/iit-cs579/main/issues/30].
I'm using http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html am I on the right track? just like this cross_val_score(LinearRegression(), X, y, cv=cv)
I am guessing your cv
is a cross validation object of any of the classes available such KFold. I would try it like this:
classifier = LinearRegression()
scores= cross_val_score(classifier, X,y, cv=cv, scoring='accuracy')
scores should be an array with the values per every fold of the cv. If the default scorer of your estimator is not accuracy then the results you are getting are not that measure.
Try setting the scoring measure explicitly as 'accuracy' and see if that gives you values in the expected range.
We took your instruction and the code is like this:
def do_cv_linear(X, y, nfolds=10):
cv = KFold(len(y), nfolds)
return np.mean(cross_val_score(LinearRegression(), X, y, cv=cv,scoring = 'accuracy'))
and we get an error ValueError: Can't handle mix of multiclass and continuous
Yes, I see the problem, I misread Linear for Logistic. If you use LinearRegression you need to use a scoring according to regression tasks. For example, mean squared error. Accuracy is for classification tasks. I am guessing your y vector is a scalar vector.
If you are doing regression then use a regression measure, such as "mean_absolute_error" or "mean_squared_error"(y is a scalar vector). If you are doing classification the use "accuracy" or "f1" (y is a label vector) according to what you want to measure.
Dose anybody know what is a negative cross validation accuracy mean in linear regression model? We are fitting our data to sklearn linear regression model and get a negative accuracy which really make me confused.