Which dataset to use? - Githubissues

pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation

70 stars 19 forks source link

Should these power analyses be conducted using collected data (but with effect sizes changed), or on completely simulated data where you can control the magnitude of relationships between all variables?

Previously I tried to use a tool like MLPowSim, which asks for a lot of estimated/anticipated effect sizes (e.g. variance of A at level 1, variance at level 2 etc). But reading some of the examples of simr suggests you only need to set the size of a fixed effect

My goal is to determine what sample size I need to achieve 80% power for a particular model. But if I run simr on a collected (maybe pilot) dataset, it no longer feels like an a priori power analysis because all of the variable relationships are from observed data (apart from maybe 1 or 2 that I have changed)

So should I actually be simulating all of my data, and essentially pre-specifying all of the relationships like with MLPowSim? Or is it ok to use this on collected data?

From what I've read, power analysis for multilevel models is really difficult because of having to anticipate so many different effect sizes, variances are multiple levels etc. But simr seems to make this a bit too easy so I'm wondering if I'm missing something or if I have misunderstood how to use this package

You need to pre-specify all of the relationships whether you use simr or MLPowSim. The question is, where do you get the values from? simr makes it easy to take most of these values from a pilot study; however it's also possible to change any of these values.

The quality of the power analysis will of course depend on the accuracy of these values. If you have values from the literature, and you trust those values more than the estimates from your pilot data, then I would recommend using them.

I'm not sure I'd recommend a completely a priori approach though. If the values aren't based on past observations, how would you know if they're reasonable?

You're probably familiar with "The Abuse of Power" by Hoenig and Heisey. I think the main takeaway from that paper is that you use power analysis before your study, to inform sample size. Once you have your data, you use confidence intervals for inference. As long as you're not trying to use power analysis retrospectively on an already completed study, I think you're still doing an a priori power analysis.

It's possible that simr makes things too easy. On the other hand, the goal was to allow more people to run power analyses. I suspect the main risk from our approach is that the pilot data might be more homogeneous than the full study data, meaning that the power was overestimated. If you're worried about this, I would recommend running sensitivity analyses, using pessimistic values with VarCorr<- and sigma<-.

pitakakariki / simr

Which dataset to use? #112