Closed burakbayramli closed 11 years ago
Thanks for reporting. Does any of the linear model folks have time to look into this?
I'll try (not kow, I just landed back in Paris and will be happy to get home after travelling), but I believe that it's not a bug: it's the normal behavior of cross-validation. In addition, nothing garantees in real settings that the better performances of 0.3 do not simply reflect overfit.
----- Original message -----
Thanks for reporting. Does any of the linear model folks have time to look into this?
Reply to this email directly or view it on GitHub: https://github.com/scikit-learn/scikit-learn/issues/1671#issuecomment-13355439
Could you please use train_test_split
from sklearn.cross_validation
to create the splits or manually shuffle the data? I am not sure that splitting without shuffling is a good idea.
Could you please use train_test_split from sklearn.cross_validation to create the splits or manually shuffle the data? I am not sure that splitting without shuffling is a good idea.
Indeed, also the size of the folds for LassoCV
is 1/3 of the data which makes the training set / test size 295 / 147.
In your manual split you use 422 / 20 which is very different training size hence the optimal regularization is not the same. Try to increase the value of cv
, for instance to 10 to check whether you converge to a similar solution.
I see! When I tried
k_fold = cross_validation.KFold(n=400, k=10, indices=True) lasso = linear_model.LassoCV(cv=k_fold) X_diabetes = diabetes.data y_diabetes = diabetes.target print lasso.fit(X_diabetes, ydiabetes) print lasso.alpha
I see alpha reporting 0.3 which is the optimal value.
I am on scikit-learn version 0.13-git. Here is the problem: The lambda value entered by hand 0.3 for Lasso performs much better than 0.013 found by LassoCV which utilizes crossvalidation. I used the standard diabetes data.
https://gist.github.com/burakbayramli/4750196
I based the code on this page
http://scipy-lectures.github.com/advanced/scikit-learn/index.html#sparse-models
Thanks,