merliseclyde / BAS

BAS R package https://merliseclyde.github.io/BAS/
https://merliseclyde.github.io/BAS/
GNU General Public License v3.0
41 stars 16 forks source link

always.include works differently in 1.5.2 (github version) #26

Closed AlexanderLyNL closed 6 years ago

AlexanderLyNL commented 6 years ago

Hi Merlise,

I’ve been playing with the GitHub version of BAS (1.5.2) and compared this to the CRAN version (1.5.1). In both versions I added contcor1 to the null model:

“include.always=as.formula("contNormal ~ contcor1”)”

The CRAN version (1.5.1) gives me the following:

bas1 5 1

which I think is correct. When I apply the new image function (1.5.2) with drop.always.included=TRUE to the cranBasObject I get the following:

bas1 5 1drop

Only the intercept is removed, but not contcor1.

————————

On the other hand, the GitHub version (1.5.2) gives me:

bas1 5 2

which seems to be quite different, and note that contcor1 is not included in some of the models. The new feature drop.always.included=TRUE does remove contcor1, but not the intercept.

bas1 5 2drop

The estimates also differ. Do you know whether there’s something wrong with my installation? Or did I specify the include.always variables incorrectly? Here’s the R script:

https://www.dropbox.com/s/jjqlsm7u9qtyp94/basIncludeAlwaysGithubVersion.R?dl=0

https://www.dropbox.com/s/varuoy0pf2ud8a6/cranBasObj.RData?dl=0

Cheers, Alexander

merliseclyde commented 6 years ago

There is a conflict between the force.heredity=TRUE (the new defaults) and include.always, but I see that is not clearly documented in the help files. That can explain part of the difference in the bas objects created from v1.5.1 and v1.5.2

The include.aways option reorders variables that are always included to be at end of the model vector (i.e include.always sets initprobs to 1, and then the variables are sorted so that the highest probabilities are last. This order was causing a problem with checks for parents used in the force.heredity = TRUE, i.e. a child would not be added because the code assumed that the parents would be checked first. So for the time being the initprobs were being overwritten to keep the order of variables in the model. This will need to be fixed to allow both options.

Will look at whether the intercept is being added to the include.always list in the output in case that changed between versions.

merliseclyde commented 6 years ago

@AlexanderLyNL let me know if the new version on GitHub resolves your issue with the models/plotting (it seems to work with my limited testing; I've added a file JASP-tests.R in the tests directory for testing and the debug.csv data set in inst/testdata) so

loc = system.file("testdata", package="BAS")
d = read.csv(paste(loc, "JASP-testdata.csv", sep="/"))

simpleFormula = as.formula("contNormal ~ contGamma + contcor1 + contGamma * contcor1 ")

set.seed(1)
library(BAS)
basObj = bas.lm(simpleFormula,
                      data = d,
                      alpha = 0.125316,
                      prior = "JZS", include.always=as.formula("contNormal ~ contcor1"),
                      weights = d$facFifty)

image(basObj, rotate=FALSE)
image(basObj, rotate=FALSE, drop.always.included=TRUE)
basObj$include.always  

coefficient estimates, R2, and log marginals agree, but just noticed that the posterior probabilities did not agree so there may be an issue with the re-weighting of the output. (will check which is correct :-)

cheers!

merliseclyde commented 6 years ago

force.heredity.bas was using a different prior probability so posterior probabilities were not comparable. For now, function has been updated to use the same prior probabilities as with sampling, but now reweighed over the models that satisfy the heredity condition.

Future work can address asking more meaningful priors for constrained models

loc = system.file("testdata", package="BAS")
d = read.csv(paste(loc, "JASP-testdata.csv", sep="/"))

simpleFormula = as.formula("contNormal ~ contGamma + contcor1 + contGamma * contcor1 ")

library(BAS)
set.seed(1)
basObj = bas.lm(simpleFormula,
                data = d,
                alpha = 0.125316,
                prior = "JZS",
                include.always=as.formula("contNormal ~ contcor1"),
                modelprior=beta.binomial(1,1),
                weights = d$facFifty)

image(basObj, rotate=FALSE)
image(basObj, rotate=FALSE, drop.always.included=TRUE)
basObj$include.always

## old
##
## install.packages("BAS")

## library(BAS)

 set.seed(1)
 basObj.old = bas.lm(simpleFormula,
                data = d,
                alpha = 0.125316,
                prior = "JZS",
                include.always=as.formula("contNormal ~ contcor1"),
                modelprior=beta.binomial(),
                weights = d$facFifty, force.heredity = FALSE)
 basObj.old = force.heredity.bas(basObj.old)

 basObj.old$postprobs  #(check order of models)
 basObj$postprobs
merliseclyde commented 6 years ago

added unit-test in tests/testthat/test-bas-lm.R:

test_that("force.heredity", {
  # based on bug #26
  loc <- system.file("testdata", package = "BAS")
  d <- read.csv(paste(loc, "JASP-testdata.csv", sep = "/"))

  simpleFormula <- as.formula("contNormal ~ contGamma + contcor1 + contGamma * contcor1 ")

  set.seed(1)
  basObj <- bas.lm(simpleFormula,
    data = d,
    alpha = 0.125316,
    prior = "JZS",
    include.always = as.formula("contNormal ~ contcor1"),
    modelprior = beta.binomial(1, 1),
    weights = d$facFifty
  )
  set.seed(1)
  basObj.old <- bas.lm(simpleFormula,
    data = d,
    alpha = 0.125316,
    prior = "JZS",
    include.always = as.formula("contNormal ~ contcor1"),
    modelprior = beta.binomial(),
    weights = d$facFifty, force.heredity = FALSE
  )
  basObj.old <- force.heredity.bas(basObj.old)

  expect_equal(basObj$probne0, basObj.old$probne0)
})

Closing now, but comment if there are still unresolved issues or create a new issue