pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation
70 stars 20 forks source link

error when extending a model fit on lm with fixed effects (no random effects) #183

Open mandu427 opened 4 years ago

mandu427 commented 4 years ago

I'm using the lm function with fixed effects (specified as a dummy variable with factors) and have been able to use the powerSim function. However, when I try to use the "extend" function, to see how a larger sample size would perform, I get the following error:

Error in if (is.finite(resvar) && resvar < (mean(f)^2 + var(c(f))) * 1e-30) warning("essentially perfect fit: summary may be unreliable") : missing value where TRUE/FALSE needed

I cannot use the lmer function because to my knowledge it does not allow for a model with a single fixed effect if there is no random effect. I tried removing all rows with NAs, but the error does not go away.

Is there a workaround for this?

pitakakariki commented 4 years ago

You should be able to use extend directly on the data frame, and then getData(fm) <- extended_X to replace the original data frame with the extended one.

mandu427 commented 4 years ago

I just tried this method; the extend function does create a new dataset but I don't think it can "replace the dataset from the regression model" as you specify via getData(lm.fit) <- extended.

When I do this I get the following error:

Error in if (is.finite(resvar) && resvar < (mean(f)^2 + var(c(f))) * 1e-30) warning("essentially perfect fit: summary may be unreliable") : missing value where TRUE/FALSE needed

Would an equivalent approach be to use extend function to simulate many replications and see how often the model rejects? (I'm assuming that the new data generated is random and can be changed by setting a different seed?)

On Tue, May 12, 2020 at 2:12 AM Peter Green notifications@github.com wrote:

You should be able to use extend directly on the data frame, and then getData(fm) <- extended_X to replace the original data frame with the extended one.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pitakakariki/simr/issues/183#issuecomment-627132259, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM3B75OOCDFKSE5TQBEZ3XTRRDSEVANCNFSM4M6QSFGQ .

pitakakariki commented 4 years ago

Hang on why are you getting an error? Shouldn't that just be a warning? Does it work you turn your warning option down?

mandu427 commented 4 years ago

I keep getting that error. It's not a warning, and it appears even if I turn off the warning option. So I can use the original data frame and extend it, but can't use it to replace the data in the regression model using getData(fm):

[image: Screen Shot 2020-05-15 at 1.33.58 AM.png]

Would the following workaround make sense?

model1 <- lm(y ~ x + z + householdID, data = pop.data) (FE fitted as dummies) coef(model1)["x"]<--0.5 # effect size sim.model1 <-powerSim(model1,nsim=100) # look at the power of the pilot data

calculate sample size needed for desired power

extended_data<-extend(pop.data,along="householdID", n=300) suggested but doesn't work: getData(model1)<-extended_data Instead, estimate a new regression model using the extended data, and run powerSim on the larger data: model2<- lm(y ~ x + z + householdID, data = extended_data) coef(model2)["x"]<--0.5 powerSim(model2,nsim=100)

Would this work?

Many thanks in advance!

On Tue, May 12, 2020 at 2:43 AM Peter Green notifications@github.com wrote:

Hang on why are you getting an error? Shouldn't that just be a warning? Does it work you turn your warning option down?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pitakakariki/simr/issues/183#issuecomment-627146064, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM3B75M4XHR6K2Y7YUNDJKTRRDV2DANCNFSM4M6QSFGQ .

pitakakariki commented 4 years ago

It looks like that would work in theory, But it looks very much like your householdID should really be a random effect. I think this might be what's causing the problem?

mandu427 commented 4 years ago

Thank you.

The power analysis I'm conducting is strictly for the FE so RE is not an option.

When power calculation is done on a FE model, does MC simulation account for clustered standard errors? I ask because we do specify a desired effect size by assigning it to the 'fixef' variable, but there's nowhere to indicate clustered standard errors. Or is this implicitly being accounted for through the MC simulation process which accounts for fixed effects? (But in which case, we go back to my previous question as the raw dataset itself does not specify the coefficient estimates that describe the relationship of x's to y..)

Thank you once again so much for your help.

On Sun, May 17, 2020 at 10:05 PM Peter Green notifications@github.com wrote:

It looks like that would work in theory, But it looks very much like your householdID should really be a random effect. I think this might be what's causing the problem?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pitakakariki/simr/issues/183#issuecomment-629906863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM3B75IHGKYY2EQ7VED6OZ3RSCJX7ANCNFSM4M6QSFGQ .