hmsc-r / HMSC

GNU General Public License v3.0
103 stars 37 forks source link

setting mcmcStep for conditional cross-validation #133

Open cgoetsch opened 2 years ago

cgoetsch commented 2 years ago

What is an appropriate value of mcmcStep to use for estimating the site loadings when running conditional cross-validation on a fitted HMSC model? I have a model with the following specifications:

Hmsc object with 7225 sampling units, 10 species, 10 covariates, 1 traits and 2 random levels Posterior MCMC sampling with 4 chains each with 3000 samples, thin 700 and transient 1500000

I have looked in the HMSC book, the predict.HMSC, and the computePredictedValues help files and all I see for mcmcStep is that it "should be set high enough to obtain appropriate conditional predictions.”

Is there a rule-of-thumb for determining how high I should set this based on the iterations from my fitted model? What would you recommend?

Thank you, Chandra Goetsch

ovaskain commented 2 years ago

Dear Chandra,

Unfortunately there is no rule of thumb. This is a similar question as how many MCMC steps are needed to fit the model. Sometimes 1000 is enough, sometimes 1000,000 is not enough. There is not rule of thumb, except that typically a bigger and more complex model needs more iterations. What I usually do with MCMC steps is to decide that I want 250 posterior samples from each of 4 chains, so that 1000 samples in total. And that I will have the initial 1/3 of the chains as transient. Thus, if not thinning (thin=1), I will run 375 iterations for each chain and cut the first 125 to keep the last 250. Then I loop for thin=c(1,10,100,1000,….), so that e.g. with thin 1000 the chains have 375,000 iterations, of which 125,000 is dropped as transient and the remaining 250,000 are thinned to give 250 samples per chain. I keep fitting the models until I see that MCMC convergence has been achieved or that the computational time for the next thin would be too large and thus I cannot compute it. Note that this does not “waste” much computational time, as e.g. thin=c(1,10,100) takes only 11% of the time of thin=1000, so it is always the last one that really took most of the time. In this way I don’t need to decide beforehand how many iterations are needed, but I explore how convergence progresses as thin increase.

In the same way, for conditional cross validation I would look for mcmcStep=c(1,10,100,…) until the results converge. Of course, this can become computationally very intensive, but I don’t think there is any way of deciding about the needed mcmcSteps beforehand. This exploration you don’t however need to do for the full cross-validation but a smaller set of conditional prediction tasks.

Otso

From: cgoetsch @.> Sent: tiistai 22. helmikuuta 2022 21:58 To: hmsc-r/HMSC @.> Cc: Subscribed @.***> Subject: [hmsc-r/HMSC] setting mcmcStep for conditional cross-validation (Issue #133)

What is an appropriate value of mcmcStep to use for estimating the site loadings when running conditional cross-validation on a fitted HMSC model? I have a model with the following specifications:

Hmsc object with 7225 sampling units, 10 species, 10 covariates, 1 traits and 2 random levels Posterior MCMC sampling with 4 chains each with 3000 samples, thin 700 and transient 1500000

I have looked in the HMSC book, the predict.HMSC, and the computePredictedValues help files and all I see for mcmcStep is that it "should be set high enough to obtain appropriate conditional predictions.”

Is there a rule-of-thumb for determining how high I should set this based on the iterations from my fitted model? What would you recommend?

Thank you, Chandra Goetsch

— Reply to this email directly, view it on GitHubhttps://github.com/hmsc-r/HMSC/issues/133, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEIYMZQTGSKHIXHFLKFFV4LU4PTFNANCNFSM5PCMAWZQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

cgoetsch commented 2 years ago

Otso,

Thank you for your reply. I do have a follow-up question. If I run a loop varying the mcmcStep (1, 10, 100, . .. . ), what would I be exploring from the result to be sure that I choose the appropriate number of steps? Since the output of computePredictedValues for conditional cross-validation are an array of conditional predictions, would I be comparing these in some way to the output from the predictions from unconditional cross-validation? Or would I be comparing the conditional results from each loop iteration to each other to see if the predictions have stabilized in some way? The function computePredictedValues does not output mcmc chains, so I am not sure what you mean by checking the convergence. Do I need to manually change something in the code? I feel like I am missing something in your explanation.

Chandra

ovaskain commented 2 years ago

This depends what you are using the conditional predictions for. Let’s assume that you wish to compute AUC and in particular see how much accounting for associations help, in the sense of how much better results conditional cross-validation gives compared to usual cross-validation. Let’s assume that the unconditional cross-validation would yield the AUC value of 0.8. Let’s assume that the conditional cross-validation would give the AUC values of 0.81 (thin=1), 0.92 (thin=10), 0.93 (thin=100), 0.93 (thin=1000), 0.93 (thin=10,000). Then I would conclude that thin=1 is not sufficient, whereas thin=10 is in practice already sufficient for reaching the main conclusions, and thin=100 is well sufficient.

Otso

From: cgoetsch @.> Sent: keskiviikko 23. helmikuuta 2022 16:04 To: hmsc-r/HMSC @.> Cc: Ovaskainen, Otso T @.>; Comment @.> Subject: Re: [hmsc-r/HMSC] setting mcmcStep for conditional cross-validation (Issue #133)

Otso,

Thank you for your reply. I do have a follow-up question. If I run a loop varying the mcmcStep (1, 10, 100, . .. . ), what would I be exploring from the result to be sure that I choose the appropriate number of steps? Since the output of computePredictedValues for conditional cross-validation are an array of conditional predictions, would I be comparing these in some way to the output from the predictions from unconditional cross-validation? Or would I be comparing the conditional results from each loop iteration to each other to see if the predictions have stabilized in some way? The function computePredictedValues does not output mcmc chains, so I am not sure what you mean by checking the convergence. Do I need to manually change something in the code? I feel like I am missing something in your explanation.

Chandra

— Reply to this email directly, view it on GitHubhttps://github.com/hmsc-r/HMSC/issues/133#issuecomment-1048814038, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEIYMZS2TSKIOMBRW6ER4ATU4TSN7ANCNFSM5PCMAWZQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.**@.>>

cgoetsch commented 2 years ago

Otso,

Understood. Thank you for clarifying.

Cheers, Chandra