pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation
70 stars 19 forks source link

Extend and enforce balance #89

Open pitakakariki opened 6 years ago

pitakakariki commented 6 years ago

This is fine for random factors, but a fixed factor needs a parameter to be specified for each dummy.

Probably can't do this automatically, but maybe add an example to one of the vignettes?

(original moved here: https://github.com/pitakakariki/simr/issues/124)

stefanocoretta commented 6 years ago

I think my problem is related to this issue, so I will post here a comment.

A simplified version of my model looks like this:

m <- lmer(duration ~ voicing * language + (1+voicing|speaker) + (1|word))

where duration is numeric (milliseconds), voicing is factor (voiced or voiceless), language is factor (Italian or Polish), speaker is factor (17 different speakers), and word is factor (27 levels). My dataset has 17 subjects, 11 are Italian speakers and 6 Polish speakers. I would like to extend the model in such a way that I will have 20 Italian and 20 Polish speakers because I am interested in simulating the power curve for the effect of the voicing:language interaction.

I tried `extend(m, along = "speaker", n = 40) but the resulting model has 28 speakers of Italian and 12 of Polish.

Is there a way of obtaining 20 Italians and 20 Polish?

pitakakariki commented 6 years ago

Hi Stefano, I've got a bit of a backlog of queries about simr at the moment. You're currently at number 3 in the queue, hopefully I'll be able to have a look at this by the end of the week.

pitakakariki commented 6 years ago

There's currently no good way to do this within extend.

You can use getData<- to attach any arbitrary design to your model though.

Maybe something like:

data1 <- subset(mydata, language=="Italian")
data1$obs <- 1:11

data2 <- subset(mydata, language=="Polish")
data2$obs <- 1:6

xdata1 <- extend(data1, along="obs", n=20)
xdata2 <- extend(data2, along="obs", n=20)

xmydata <- rbind(xdata1, xdata2)

getData(m) <- xmydata
stefanocoretta commented 6 years ago

Thanks! Your solution seems to be working after adapting the code to my specific case.

I actually used values in extend() rather then n because language is between-speakers (using n was creating 20 subjects, each with rows for Italian and Polish, I should have been more exhaustive when describing the dataset).

The (non-MWE) code:

data1 <- subset(data, language == "Italian")
data2 <- subset(data, language == "Polish")

xdata1 <- extend(data1, along = "speaker", values = paste0("it", 1:40))
xdata2 <- extend(data2, along = "speaker", values = paste0("pl", 1:40))

getData(m) <- rbind(xdata1, xdata2)
getData(m)$speaker <- as.factor(getData(m)$speaker)

The last line is needed to coercespeaker back to factor (otherwise powerCurve throughs an error Error in plot.window(...) : need finite 'xlim' values.

palday commented 5 years ago

Maybe as an intermediate step, the lack of balance enforcement could be mentioned in the extend() documentation?