GeneralTestBF: denominator argument and names<- replacement method

richarddmorey / BayesFactor

BayesFactor R package for Bayesian data analysis with common statistical models.

https://richarddmorey.github.io/BayesFactor/

132 stars 49 forks source link

GeneralTestBF: denominator argument and names<- replacement method #1

Closed singmann closed 11 years ago

singmann commented 11 years ago

This pull request adds two things (and some things to the .gitignore file that can be considered collateral damage):

A denominator argument to generalTestBF against which all tested models are compared to (however, the old behavior remains intact).
A names replacement methods to objects of class BfBayesFactor. This also promoted the names method to an S4 method (was S3).

richarddmorey commented 11 years ago

Hi Henrik, I noticed that what the code seems to do is add the terms in the denominator into all the models compared; so, if the you specify the model as y ~ a_b and the denominator as c + d, you get all models containing c + d. This seems to duplicate the functionality of the neverExclude argument - that is, you could get the same results currently by specifying the model as y ~ a_b + c + d, and neverExclude = c("c","d"). Would you agree?

My thought is that it might be better if we simply use the denominator without changing the models. If you wanted to include them, that's fine, you just have to make sure they're in the model specification. Otherwise, it just adds the denominator model to the list of models to analyze, and then makes that one the denominator. Do you see any disadvantage to this approach to yours? I worry that your approach is sort of unexpected behavior, since it changes the requested model set.

singmann commented 11 years ago

Hi Richard, The rationale for this is that what I think is the best is to select the models to compare (which can be done via whichModels) independent of the denominator (it is the denominator, it shouldn't influence your model selection apart from being the denominator). However, if the denominator is handed over to enumerateGeneralModels and you specify neverExclude the results are somewhat non-predictable or perhaps better unexpected. See the following example:

> generalTestBF(RT ~ shape*color, data = puzzles, whichRandom = "ID", 
    denominator = ~ ID+ID:shape+ID:color, progress=FALSE, noSample=TRUE)
Bayes factor analysis
--------------
[1] shape + color + shape:color + ID + ID:shape + ID:color : NA ±NA%
[2] shape + color + ID + ID:shape + ID:color               : NA ±NA%
[3] color + ID + ID:shape + ID:color                       : NA ±NA%
[4] shape + ID + ID:shape + ID:color                       : NA ±NA%

Against denominator:
  RT ~ ID + ID:shape + ID:color 
---
Bayes factor type: BFlinearModel, JZS

> generalTestBF(RT ~ shape*color+ID+ID:shape+ID:color, data = puzzles, whichRandom = "ID", 
      progress=FALSE, neverExclude = "ID", noSample=TRUE)
Bayes factor analysis
--------------
[1] shape * color + ID + ID:shape + ID:color : NA ±NA%
[2] shape + color + ID + shape:ID + color:ID : NA ±NA%
[3] ID + shape:ID + color:ID                 : NA ±NA%

Against denominator:
  Intercept only 
---
Bayes factor type: BFlinearModel, JZS

Only in the first example (using the denominator argument) you get the models that I think one wants in this case (at least this is what I would want in this case: the three main effect models, plus the full model). The second example shows that using the neverExclude argument does a considerably worse job.

To make this reasoning clear I added the following to the RD files:

denominator: a (one-sided) formula containing the (optional) denominator for the analysis. If present, will be added to each model. (see Note and Examples)

And:

To test the model against a specific denominator, a (one-sided) formula can be specified which will be added to each model. These effects do not need to be specified in the formula argument. Most importantly, selection of models to be tested (which is controlled by arguments whichModels and neverExclude) is only performed on formula. Random effects can be specified for both formula and denominator. See Examples

Perhaps there is another possibility to achieve what I think seems to be the intended behavior, but this was the way I came up with.

Best, Henrik

richarddmorey commented 11 years ago

Ah, see, I would want this:

generalTestBF(RT ~ shape*color+ID+ID:shape+ID:color, data = puzzles, 
     whichRandom = "ID", progress=FALSE, neverExclude = "^ID$", noSample=TRUE)

Bayes factor analysis
--------------
[1] shape * color + ID + ID:shape + ID:color    : NA       ±NA%
[2] shape + color + shape:ID + color:ID + ID    : NA       ±NA%
[3] shape + color + shape:color + color:ID + ID : NA       ±NA%
[4] shape + color + shape:color + shape:ID + ID : NA       ±NA%
[5] shape + color + color:ID + ID               : NA       ±NA%
[6] shape + color + shape:ID + ID               : NA       ±NA%
[7] color + color:ID + ID                       : NA       ±NA%
[8] shape + color + ID                          : NA       ±NA%
[9] color + ID                                  : NA       ±NA%
[10] shape + ID                                 : NA       ±NA%
[11] shape + shape:ID + ID                      : NA       ±NA%
[12] shape + color + shape:color + ID           : NA       ±NA%
[13] ID                                         : 111516.6 ±0%

With the additional step of dividing by the ID only model (which could be accomplished by a denominator argument). I don't like the idea of an interaction (eg, shape:ID) being in the model without all main effects. See, for instance, http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf (esp. page 13).

singmann commented 11 years ago

I totally understand your unease with interactions in the absence of main effects and do not want to encourage this. However, I think that we have a completely different situation here than the one described by e.g. Venables.

The statistical framework my reasoning stems from is mixed modeling (e.g., Barr et al., 2013). In this framework there is a clear separation between the random effects part and the fixed effects part and interactions with the random effects (i.e., random slopes) do have a completely different interpretation then interactions with fixed effects. Specifically, a random slope only indicates that the random intercept for this factor is altered depending on the level of the fixed effect. Hence the whole critic of Venables does not apply here (i.e., the presence of an random slope does not interfere with the parameterization of the fixed effects for this parameter). More specifically, when testing fixed effects, the random effects structure is not altered. Barr et al. say (p. 277):

To perform such a test, one compares a model containing the fixed effect of interest to a model that is identical in all respects except the fixed effect in question. One should not also remove any random effects associated with the fixed effect when making the comparison.

(emphasis added)

All this naturally depends on how the random slopes in BayesFactor are actually handled, but I assumed it to be the way as just described for the non-Bayesian mixed models. Hence my idea of keeping a fixed denominator. The paper I am currently writing heavily uses this approach. If you can wait, I will send you a first complete draft probably tomorrow, to see what I mean.

richarddmorey commented 11 years ago

That may work best for obtaining p values, as they note (that paragraph opens with the context that they are computing p values), I don't know - but I simply don't think that the model comparison makes sense. For example, you're essentially testing a strange hypothesis, about the mean coloreffect. After all, the color effects in the full model across participants are (color + color:ID). The fixed effect of color acts as an intercept, shifting the (color + color:ID) effects away from 0. Eliminating the color main effect and testing color + color:ID against color:ID, therefore, is just a test that the mean color effect is 0 over participants. Is this what you intend? It seems like a strange hypothesis to test.

rouderj commented 11 years ago

Hi All, Henrik, I have been told might be seeing you next week in Mannheim; I hope this is so. O.K., let me weight in on the interpretation of main effects in the presence of random and fixed interactions. So, we have a models M1: Y~A+B+A:B+noise. M2: Y~A+B+noise M3: Y~B+A:B+noise

Suppose we know A:B exists by comparing M1 to M2.. The question is whether it makes sense to test for a main effect of A by comparing M1 to M3. And as importantly, what types of defaults do we wish to choose for the BayesFactor package.

For fixed effects where the levels of A and B are chosen by the experimenter. One of the arguments against interpreting the effect of A in the presence of A:B is that the main effect of A is dependent on the levels of B, which seem to be at the perogative of the experimenter. Different experimenters who chose different levels of B will get different results for the main effect of A. This point is made by Venebles. This argument is not the only argument for interpreting the main effect, and it might not be the best. The three of us seemingly agree, however, that one should not interpret the main effect of A with fixed interactions.
Let's take the case where B is random. Here experimenters cannot manipulate the levels, they just come from the context or setting, such as the context of a bunch of college kids who happen to be in your sample. The claim above is that since we cant manipulate B and, presumably, two experimenters would be sampling from the same distribution the levels of B, they should get the same main effects of A. But does this mean that we can or should interpret the main effect of A.

For me, the answer is "probably not," with some caveats. Let's take the case that B is subjects and A is color, and we are comparing how fast people read red and green words. Let's assume that there is substantial A:B and that X% the people read red faster than green and for the other (100-X)%, the reverse holds. If X% is 50%, then we might observe no main effect of A; conversely if X is quite different than 50, we might observe some main effect. Now, let's say X is 67%, that is most people read read better green but a substantial minority, 33% read green faster than read. Are we comfortable stating that "red words are read faster than green words" i this case even though for 1/3 of the population it is untrue? So, I probably would be ok interpreting the main effect of A if there was some dominance relationship where everyone is better in one level than the other, but not otherwise. It leaves too much out. Of course there are manipulations where it is feasible that everyone does truly do better in one level than another, say identifying briefly flashed words for 20 ms vs. 40 ms., so this dominance requirement does not strike me as too harsh. The mean effect is a useful surrogate when all the effects go in the same direction for ordinal statements.

singmann commented 11 years ago

Hi All, And thanks for the great discussion so far.

I agree, the question can be boiled down to the one of comparing M1 (with main effect of A) and M3 (without main effect of A) when B is the random participant effect. However, lets consider another model in addition to the three above: M4: Y~ B+noise

Imagine that we find that M3 has the highest BayesFactor followed by, in order, M1, M4, M3 (all in relation to the intercept only model). What would this mean? (note, this is relatively exactly the pattern I find in the paper I promised to sent, but haven't yet) At first it would mean that we have no evidence for an overall main effect of A (i.e., the mean effect of A is near 0 over participants). I think this is interesting (the mean effect is a useful surrogate, and it is overall 0). Additionally, we see that there is substantial heterogeneity in how participants react to A. There are at least two possible interpretations: The one is given above and is that for 50% of the participants the effect goes in one direction and for 50% the effect goes in the other direction. Alternatively, there could also be a subpopulation for which the effect goes dramatically in one direction whereas for the majority the effect is considerably smaller but goes in the other direction. Furthermore, the preference for M4 over M2 indicates, as above, that the mean effect of A is 0 (or near 0).

What I want to say with this is that I indeed think that the model with interactions on the random effect but without the main effect (i.e., M3) can be a reasonable model. It tells us something about the data we wouldn't know if we would only have M1, M2 and M4.

However, the question remains how do we implement something like this in BayesFactor. As I think that random slopes can be interesting, I think the random-slopes and fixed-lopes should be handled differently. In the example above (i.e., generalTestBF(RT ~ shape*color+ID+ID:shape+ID:color, data = puzzles, whichRandom = "ID", progress=FALSE, neverExclude = "^ID$", noSample=TRUE)) this is not the case. The critical model (i.e., shape:ID + ID) is missing. But perhaps my idea with a fixed denominator is also not the best.

So if you decide to not incorporate my denominators change (which is totally fine, one can easily get what I want with lmBF), I will just remove this part of my pull request and will submit another one with only the names<- change.

PS: I am also very much looking forward to see you in Mannheim next week, Jeff.

richarddmorey commented 11 years ago

Hi Henrik, once you have the knowledge that the M1 outperforms M2, what I would do is look at the estimates of the parameters (using posterior()) from M1. This would give me the information about how the participant effects differ, given that they do. There's no need to introduce and test M3. The test of M3 might be informative in the sense that it gives you a hint of what you might find when you do the parameter estimation, but still nonsensical. The posterior estimates are both informative and sensical.

Regarding what to do about the pull request; I'm still figuring out how github works. So let's leave it for now until I figure out what to do.

richarddmorey commented 11 years ago

Henrick, could you close this issue and split the two issues up with two pull requests, so I can handle them separately?