merliseclyde / BAS

BAS R package for Bayesian Model Averaging and Variable Selection
https://merliseclyde.github.io/BAS/
GNU General Public License v3.0
41 stars 16 forks source link

Linux vs Windows: different results #80

Open marcingretl opened 7 months ago

marcingretl commented 7 months ago

Describe the bug While running the same experiment on Linux and Windows I'm getting different results.

To Reproduce Steps to reproduce the behavior:

library("BAS")

prefix = .Platform$OS.type sink(paste(prefix, "_logit_bas.txt", sep=""), append=FALSE, split=FALSE)

data("birthwt", package = "MASS") y <- birthwt$lo x <- data.frame(birthwt[,-1]) x <- x[,-9] x$race <- as.factor(x$race)

Nrep = 10000 set.seed(1000000) bas.time <- system.time(birthwt.bas.glm <- bas.glm(y ~ ., data=x, family=binomial(link = "logit"), method="MCMC", MCMC.iterations=Nrep, laplace=TRUE)) print("bas.glm results for birthwt:") print(bas.time) print(birthwt.bas.glm$modelprior) summary(birthwt.bas.glm) coef(birthwt.bas.glm)

Expected behavior Posterior results should be identical.

Additional context [Linux] [1] "bas.glm results for birthwt:" użytkownik system upłynęło 0.160 0.017 0.197 $family [1] "Beta-Binomial"

$hyper.parameters [1] 1 1

attr(,"class") [1] "prior" P(B != 0 | Y) model 1 model 2 model 3 model 4 Intercept 1.0000 1.000000 1.0000000 1.0000000 1.0000 age 0.1422 0.000000 0.0000000 0.0000000 0.0000 lwt 0.4241 0.000000 0.0000000 1.0000000 1.0000 race2 0.2564 0.000000 0.0000000 0.0000000 0.0000 race3 0.1971 0.000000 0.0000000 0.0000000 0.0000 smoke 0.2999 0.000000 0.0000000 0.0000000 0.0000 ptl 0.3185 0.000000 1.0000000 0.0000000 0.0000 ht 0.4026 0.000000 0.0000000 0.0000000 1.0000 ui 0.2511 0.000000 0.0000000 0.0000000 0.0000 ftv 0.0841 0.000000 0.0000000 0.0000000 0.0000 BF NA 0.482511 0.4451593 0.2986569 1.0000 PostProbs NA 0.342100 0.0417000 0.0334000 0.0322 R2 NA 0.000000 0.0289000 0.0255000 0.0577 dim NA 1.000000 2.0000000 2.0000000 3.0000 logmarg NA -118.268722 -118.3492936 -118.7484304 -117.5400 model 5 Intercept 1.0000000 age 0.0000000 lwt 0.0000000 race2 0.0000000 race3 0.0000000 smoke 0.0000000 ptl 0.0000000 ht 0.0000000 ui 1.0000000 ftv 0.0000000 BF 0.1898685 PostProbs 0.0235000 R2 0.0216000 dim 2.0000000 logmarg -119.2013938

Marginal Posterior Summaries of Coefficients:

Using BMA

Based on the top 346 models post mean post SD post p(B != 0) Intercept -0.1204457 1.1600878 1.0000000
age -0.0052729 0.0187699 0.1422000
lwt -0.0067728 0.0091024 0.4241000
race2 0.2642035 0.5250595 0.2564000
race3 0.1501464 0.3643195 0.1971000
smoke 0.2373405 0.4224629 0.2999000
ptl 0.2135715 0.3678763 0.3185000
ht 0.6881534 0.9508727 0.4026000
ui 0.2097175 0.4255995 0.2511000
ftv -0.0008163 0.0509175 0.0841000

[Windows] [1] "bas.glm results for birthwt:" użytkownik system upłynęło 0.42 0.00 0.42 $family [1] "Beta-Binomial"

$hyper.parameters [1] 1 1

attr(,"class") [1] "prior" P(B != 0 | Y) model 1 model 2 model 3 model 4 Intercept 1.0000 1.000000 1.0000000 1.0000 1.0000000 age 0.1164 0.000000 0.0000000 0.0000 0.0000000 lwt 0.3539 0.000000 0.0000000 1.0000 1.0000000 race2 0.2104 0.000000 0.0000000 0.0000 0.0000000 race3 0.1875 0.000000 0.0000000 0.0000 0.0000000 smoke 0.2750 0.000000 0.0000000 0.0000 0.0000000 ptl 0.2575 0.000000 1.0000000 0.0000 0.0000000 ht 0.3326 0.000000 0.0000000 1.0000 0.0000000 ui 0.2133 0.000000 0.0000000 0.0000 0.0000000 ftv 0.0741 0.000000 0.0000000 0.0000 0.0000000 BF NA 0.482511 0.4451593 1.0000 0.2986569 PostProbs NA 0.444100 0.0376000 0.0321 0.0260000 R2 NA 0.000000 0.0289000 0.0577 0.0255000 dim NA 1.000000 2.0000000 3.0000 2.0000000 logmarg NA -118.268722 -118.3492936 -117.5400 -118.7484304 model 5 Intercept 1.0000000 age 0.0000000 lwt 0.0000000 race2 0.0000000 race3 0.0000000 smoke 0.0000000 ptl 0.0000000 ht 0.0000000 ui 1.0000000 ftv 0.0000000 BF 0.1898685 PostProbs 0.0173000 R2 0.0216000 dim 2.0000000 logmarg -119.2013938

Marginal Posterior Summaries of Coefficients:

Using BMA

Based on the top 361 models post mean post SD post p(B != 0) Intercept -0.2581245 1.0803783 1.0000000
age -0.0042545 0.0169919 0.1164000
lwt -0.0056017 0.0086101 0.3539000
race2 0.2212985 0.4945867 0.2104000
race3 0.1454458 0.3610647 0.1875000
smoke 0.2216885 0.4157675 0.2750000
ptl 0.1707630 0.3389293 0.2575000
ht 0.5669318 0.9006299 0.3326000
ui 0.1768643 0.3972297 0.2133000
ftv -0.0002828 0.0477916 0.0741000

merliseclyde commented 7 months ago

Sounds like this could be different random number generators leading to different results as the number of models sampled differs in the two runs despite the same seed. See https://stackoverflow.com/questions/48626086/same-seed-different-os-different-random-numbers-in-r#:~:text=Linux%20gives%20a%20different%20random,results%20will%20agree%20or%20not) and one solution that suggests

set.seed(10, kind = "Mersenne-Twister", normal.kind = "Inversion"); rnorm(1)

to see if you get the same random numbers. Note the top model is the same in both (Null) and has the same marginal likelihood, so it may be a matter of which models are being sampled.

merliseclyde commented 7 months ago

Interestingly I get 312 unique models when running on MAC OSX R 4.3.2.

Do you get different results if you enumerate all models (use method="deterministic") for sampling.?

marcingretl commented 7 months ago

Using method="deterministic" produces exactly the same results, so it must be an issue related with pseudo-random sequence.

marcingretl commented 7 months ago

Well, this is what I got typing '> set.seed(10, kind = "Mersenne-Twister", normal.kind = "Inversion"); rnorm(10)' on Linux machine (Debian, R-4.3.2 from distro repos) and Windows machine (10 Pro, R-4.3.3): [Linux] [1] 0.01874617 -0.18425254 -1.37133055 -0.59916772 0.29454513 0.38979430 [7] -1.20807618 -0.36367602 -1.62667268 -0.25647839

[Windows] [1] 0.01874617 -0.18425254 -1.37133055 -0.59916772 0.29454513 0.38979430 [7] -1.20807618 -0.36367602 -1.62667268 -0.25647839

As you can see sequences are identical, so maybe an issue is BAS-related?