Closed Ali777927 closed 3 years ago
In terms of code,
library(ncvreg)
data(Prostate)
cvfit1 <- cv.ncvreg(Prostate$X, Prostate$y, seed=1)
cvfit2 <- cv.ncvreg(Prostate$X, Prostate$y, seed=1)
cvfit3 <- cv.ncvreg(Prostate$X, Prostate$y, seed=1)
identical(cvfit1$fold, cvfit2$fold)
identical(cvfit1$fold, cvfit3$fold)
What version are you running?
In terms of statistical practice:
Dear Prof. Breheny,
Thank you very much for your detailed answers, I appreciate that a lot.
Regarding the code:
2.Regarding the point of "fold", I am using 3.12.0 version. I get the message "R session aborted" and R stops. I believe that I not writing the code well, I don't fully understand the sentence " Which fold each observation belongs to." and how to adjust the code in that regard. My apologies, I have around one year experience in R with medical background, and I didn't persist to get that done correctly yet (and that is not necessary now, the default option is already nice). I get the aforementioned message when writing the code like this:
cvfit1 <- cv.ncvreg(Prostate$X, Prostate$y, seed=1, fold =3, nfolds = 12)
Regarding the points about nfold, I really appreciate your tips and comments, thank you so much. I finally got the same exact results by using LOOCV. Actually, I read previously about LOOCV but I was hesitant to use it because of the different comments I read on the web about its effects on the variance (they said it increases the variance), although this is debatable (and might not per se apply to the settings where we do CV for tuning parameters).
Best wishes (and Happy new year !), Ali
The argument fold
should be a vector of length n that describes which fold each observation is assigned to. For example, if n=9:
fold = c(1,1,1,2,2,2,3,3,3)
would assign the first three observations to the first fold, and so on. Also, you can look at fit$fold to see how folds were assigned and the argument should look like.
As far as the supposed increase in variance when using LOOCV, yes, this is often claimed, but I have never seen convincing evidence that this is true in general. At best, the statement is an oversimplification.
Dear Prof. Breheny,
First of all, I would like to express my gratitude for all your efforts and scientific contributions in the field of regularization and penalized regressions, we are so grateful for all of your efforts.
I am performing a linear MCP regression with around 2000 observations and around 400 potential predictors. Although I added the option “seed”, I still have troubles in getting reproducible results (and I am not able to get my code run with filling the option “fold”). In this regard I have some specific questions:
Hopefully, you can help me to get an appropriate solution for my problem. Thanks in advance.