pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation
70 stars 19 forks source link

Question: Generate fixed effects in nested structure given the outcome #131

Open b1azk0 opened 6 years ago

b1azk0 commented 6 years ago

Hi, Is there any chance you could help me out with the following?

Is it possible to add x1 and x2 predictors to a dataset storing the outcome y with subject and observation variables defining nesting?

I'm looking for this, because when simulating data I want to control the amount of variance in y that is specific to between and within two levels. So that running a NULL model of lmer(y ~ 1 + (1 | subject)) shows that variances of random intercepts and residuals are for example 1 and 1 - 50/50 ratio (or any other of my choice). This part I know how to do using base R.

What I'm looking for, is a way to extend the null model dataset (leaving y as it was) with x1 and x2 predictors that can (but don't have to) be measured at different levels (subject level or observation level) but controlling for their correlations and overall effect on y - as in running an exemplary model of `lmer(y ~ x1 + x2 + (x1 | subject), data)'.

All the best, blazej

b1azk0 commented 6 years ago

I'll try to make my point more clear as there might be some misunderstanding here

Say I want to fit a model to my data: lmer(y ~ x1 + x2 + (x1 | subject), real.data).

Before I run this model I will check what is the ratio of variances in y before adding any covariates. For this I run lmer(y~1 + (1|subject), real.data) and see that e.g. variance of the random intercept is 2 and for the residual it's 1 . So I can conclude that there is twice as much variance between subjects than within subjects. This is important as I want to define the amount of those two variances beforehand in sim.data.

In the next step for real.data I would analyse the full model lmer(y ~ x1 + x2 + (x1 | subject), real.data) looking at fixed estimates given the random structure.

In the context of sim.data I'd like to be able to specify the ratio of between and within subject variance for y (before including covariates to the model - they can be in the dataset) and having that define how my covariates will look like (for example both significant, or just one significant, with the random slope or without).

The sequence of generating data is unimportant. In my trial and error approach to writing code I managed to simulate just the null data model and was thinking to add to it covariates. If it's possible to start with the covariates and still have control over variance component's in the NULL model, that's perfect :)

b1azk0 commented 6 years ago

Also, I realised that what I was referring to as variance ratio should be called fraction.

For example, consider a data set like this: https://pastebin.com/FHwVgqXA (note there is no random slope in this example) When running a model of lmer(y ~ a + b + (1|Subject), data) I see that both a and b are significant. When running a call m0<-lmer(y ~ 1 + (1|Subject), data) I can tell that residual variance to sigma variance is almost 1 to 1

> var.fraction(m0)
lmer(formula = y ~ 1 + (1 | Subject), data = dd)
[1] "The residual variance of y is: 1.457"
[1] "The random variance of null model is: 1.53"
[1] "Theta is: 1.02"
[1] "Fraction of rand to resid variance is :  21/20"
[1] "Fraction of resid to rand variance is :  19/20"
pitakakariki commented 6 years ago

I can't see any simple way of doing this.

I think you have two options:

A) Work out, mathematically, the values for your variance parameters that will give you the results you're looking for.

B) Adjust the parameters by trial and error.

(B) is probably easier, and you could program a search to make the trial and error easier.

b1azk0 commented 6 years ago

Thanks for the hint. I'll try finding desired values with a search function