Closed petchpanu closed 2 years ago
Hi there,
May I ask, are you trying to find OOB errors for each potential mtry value? Maybe you would consider computing the K-fold cross validation error (i.e. to train on (K-1) folds and test on the rest)? Please let me know if this may help or I did not understand your question correctly!
Best, Weichi
Thanks, Weichi
On Mon, Jan 24, 2022 at 9:31 AM petchpanu @.***> wrote:
Hi Weichi Yao,
Thanks for providing very great packages. I'm trying to use the predictProb function to find OOB errors with the parameter OOB = True. However, I got an error cannot allocate vector size of 587 Gb. The dataset I'm using has almost 1 million observations with 20 features. I understand that this probably occurs due to a large number of rows. Is it possible that I could loop through each subject to get each OOB error? Or any other way that I could get OOB errors?
Thank you in advance.
— Reply to this email directly, view it on GitHub https://github.com/weichiyao/TimeVaryingData_LTRCforests/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE2HBB7MDEJBITIAJAHAUGDUXVPEBANCNFSM5MVRCWJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for your fast reply! Yes, I'm trying to tune mtry and ntree value. Are you suggesting that I should do k-fold cross-validation then calculate the Brier score (or other metrics) between train and test set to find the best set of parameters?
Thank you very much. Panu
Hi there,
I think the use of OOB errors means that
Cross-validation errors can be seen as another good estimator for the true errors and it is also "out-of-bag". In addition, as we divide the data into K folds and train only on K-1 folds, say K=5, we can reduce the sample size by 20%, which may help.
I also wonder, if you have the memory issue to allocate the vector, I suspect it might not work even if you prespecify a mtry value without tuning; this is more like a computer-related pop-up error, maybe a computer with larger memory power is in need? I could be wrong! Just a thought. Have you tried it by setting, say, mtry = 1?
Best, Weichi
On Tue, Jan 25, 2022 at 8:46 AM petchpanu @.***> wrote:
Reopened #4 https://github.com/weichiyao/TimeVaryingData_LTRCforests/issues/4.
— Reply to this email directly, view it on GitHub https://github.com/weichiyao/TimeVaryingData_LTRCforests/issues/4#event-5949968705, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE2HBBZJCU5VIXHE676Z46DUX2SSDANCNFSM5MVRCWJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: <weichiyao/TimeVaryingData_LTRCforests/issue/4/issue_event/5949968705@ github.com>
Hi Weichi Yao,
Thank you for your reply. I have tried with mtry = 2 and I have no problem during training. I think I with k-fold cross-validation as you suggested to see if the problem still arises.
Thank you, Panu
Hi Weichi Yao,
Thanks for providing very great packages. I'm trying to use the predictProb function to find OOB errors with the parameter OOB = True. However, I got an error cannot allocate vector size of 587 Gb. The dataset I'm using has almost 1 million observations with 20 features. I understand that this probably occurs due to a large number of rows. Is it possible that I could loop through each subject to get each OOB error? Or any other way that I could get OOB errors?
Thank you in advance.