stan-dev / loo

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)
https://mc-stan.org/loo
Other
148 stars 34 forks source link

Using kfold for model selection after splitting by groups #169

Open lkschwarz opened 3 years ago

lkschwarz commented 3 years ago

Using packages rstanarm and loo to run a logistic regression with four different intercepts and univariate slope hierarchical by individual, then using k-fold leave-one-group-out for model selection (kfold_split_grouped, then kfold). I get the same error when running the kfold command regardless of the model complexity. Error message: Fitting K = 60 models distributed over 3 cores Error in checkForRemoteErrors(val) : 3 nodes produced errors; first error: object 'n_chains' not found I think it has to do with the number of cores in the kfold command (The above error occurred with 3 cores). If I run it with one core, it works but impossibly slowly. More than one core, and there is a problem.

I updated R and all packages yesterday: R Session info: R version 4.0.5 (2021-03-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 other attached packages: loo_2.4.1 MCMCvis_0.15.1 rstanarm_2.21.1 Rcpp_1.0.6

I have included some ineligant sample code that should reproduce the error. The problem occurs at line 83. rstanarm_logistic_hier_test.txt

Thanks, LKS

jgabry commented 3 years ago

Sorry you're getting an error. I changed iter to a small number just to make things go faster but otherwise I used your example and I don't get any error running the kfold part. But I'm on Mac so perhaps this is a Windows issue.

To help me figure this out, aside from convergence warnings, can you tell me whether running this simpler example is successful or if you run into a similar error?

n_chains <- 2
fit <- stan_lmer(mpg ~ disp + (1|cyl), data = mtcars, 
                 refresh = 0, iter = 50, cores = n_chains, chains = n_chains)
k <- kfold(fit, cores = 2, folds = loo::kfold_split_random(K = 3, N = nrow(mtcars)))
jgabry commented 3 years ago

And sorry for the slow reply!

lkschwarz commented 3 years ago

I'm still getting the same error using two cores: Fitting K = 3 models distributed over 2 cores Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: object 'n_chains' not found

With one core, it works fine.

On Tue, May 18, 2021 at 1:58 PM Jonah Gabry @.***> wrote:

Sorry you're getting an error. I changed iter to a small number just to make things go faster but otherwise I used your example and I don't get any error running the kfold part. But I'm on Mac so perhaps this is a Windows issue.

To help me figure this out, aside from convergence warnings, can you tell me whether running this simpler example is successful or if you run into a similar error?

n_chains <- 2fit <- stan_lmer(mpg ~ disp + (1|cyl), data = mtcars, refresh = 0, iter = 50, cores = n_chains, chains = n_chains)k <- kfold(fit, cores = 2, folds = loo::kfold_split_random(K = 3, N = nrow(mtcars)))

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/169#issuecomment-843558705, or unsubscribe https://github.com/notifications/unsubscribe-auth/AT2T3WYW45W757FSYJTU6FDTOLIIZANCNFSM43RRFC3Q .

jgabry commented 3 years ago

Thanks for checking. I bet there's a problem with how we're doing parallelization on windows. Will try to look into it soon. If possible, can you try one other thing? If you replace n_chains with a number does it work? That is, if you just put 2 everywhere you currently have n_chains does it run without error? The answer to that will be helpful when trying to figure out what the problem is here.

lkschwarz commented 3 years ago

Replacing n_chains in the simple example worked. I also started replacing all the named variables (n_chains, n_iter, n_warmup, n_thin) with numbers in the call to stan_glm for my more complicated example. When all of them were numbers, the call to kfold for the complicated example worked as well.

On Thu, May 20, 2021 at 12:05 PM Jonah Gabry @.***> wrote:

Thanks for checking. I bet there's a problem with how we're doing parallelization on windows. Will try to look into it soon. If possible, can you try one other thing? If you replace n_chains with a number does it work? That is, if you just put 2 everywhere you currently have n_chains does it run without error? The answer to that will be helpful when trying to figure out what the problem is here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/169#issuecomment-845400806, or unsubscribe https://github.com/notifications/unsubscribe-auth/AT2T3W5BZQG4HAGOJJTNYPDTOVMOFANCNFSM43RRFC3Q .

jgabry commented 3 years ago

Ok great, that's super helpful for narrowing down where the problem is. And I'm glad you can at least get it working this way until we fix it.