weichiyao / TimeVaryingData_LTRCforests

8 stars 3 forks source link

LTRCCIF predictProb cannot allocate vector size of .. #4

Closed petchpanu closed 2 years ago

petchpanu commented 2 years ago

Hi Weichi Yao,

Thanks for providing very great packages. I'm trying to use the predictProb function to find OOB errors with the parameter OOB = True. However, I got an error cannot allocate vector size of 587 Gb. The dataset I'm using has almost 1 million observations with 20 features. I understand that this probably occurs due to a large number of rows. Is it possible that I could loop through each subject to get each OOB error? Or any other way that I could get OOB errors?

Thank you in advance.

weichiyao commented 2 years ago

Hi there,

May I ask, are you trying to find OOB errors for each potential mtry value? Maybe you would consider computing the K-fold cross validation error (i.e. to train on (K-1) folds and test on the rest)? Please let me know if this may help or I did not understand your question correctly!

Best, Weichi

Thanks, Weichi

On Mon, Jan 24, 2022 at 9:31 AM petchpanu @.***> wrote:

Hi Weichi Yao,

Thanks for providing very great packages. I'm trying to use the predictProb function to find OOB errors with the parameter OOB = True. However, I got an error cannot allocate vector size of 587 Gb. The dataset I'm using has almost 1 million observations with 20 features. I understand that this probably occurs due to a large number of rows. Is it possible that I could loop through each subject to get each OOB error? Or any other way that I could get OOB errors?

Thank you in advance.

— Reply to this email directly, view it on GitHub https://github.com/weichiyao/TimeVaryingData_LTRCforests/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE2HBB7MDEJBITIAJAHAUGDUXVPEBANCNFSM5MVRCWJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

petchpanu commented 2 years ago

Thank you for your fast reply! Yes, I'm trying to tune mtry and ntree value. Are you suggesting that I should do k-fold cross-validation then calculate the Brier score (or other metrics) between train and test set to find the best set of parameters?

Thank you very much. Panu

weichiyao commented 2 years ago

Hi there,

I think the use of OOB errors means that

  1. we train on the whole training set
  2. we then predict on the out-of-bag samples
  3. we compute the out-of-bag errors
  4. we choose the mtry with lowest out-of-bag errors However, as you pointed out, the training dataset itself is too large to handle.

Cross-validation errors can be seen as another good estimator for the true errors and it is also "out-of-bag". In addition, as we divide the data into K folds and train only on K-1 folds, say K=5, we can reduce the sample size by 20%, which may help.

I also wonder, if you have the memory issue to allocate the vector, I suspect it might not work even if you prespecify a mtry value without tuning; this is more like a computer-related pop-up error, maybe a computer with larger memory power is in need? I could be wrong! Just a thought. Have you tried it by setting, say, mtry = 1?

Best, Weichi

On Tue, Jan 25, 2022 at 8:46 AM petchpanu @.***> wrote:

Reopened #4 https://github.com/weichiyao/TimeVaryingData_LTRCforests/issues/4.

— Reply to this email directly, view it on GitHub https://github.com/weichiyao/TimeVaryingData_LTRCforests/issues/4#event-5949968705, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE2HBBZJCU5VIXHE676Z46DUX2SSDANCNFSM5MVRCWJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: <weichiyao/TimeVaryingData_LTRCforests/issue/4/issue_event/5949968705@ github.com>

petchpanu commented 2 years ago

Hi Weichi Yao,

Thank you for your reply. I have tried with mtry = 2 and I have no problem during training. I think I with k-fold cross-validation as you suggested to see if the problem still arises.

Thank you, Panu