I would like to report a possible bug, and I would be very grateful for advice to work around it. The issue regards missing data, and I have not found it addressed in the present way previously (#172, #143, #117, #71).
I've noticed that a minimal presence of NA cells (indeed, a single cell) has a large influence on powerCurve() and powerSim(). Please find a minimal, reproducible example below, using the cbpp data set.
For the examples with both functions, I firstly introduced one NA in a predictor (i.e., period). Then, I ran either powerCurve or powerSim using another predictor, fixed('incidence'). In both cases, the results were extremely influenced by the NA cell, especially as the number of rows was unidentified. Please note that the results are similar (especially lacking the number of rows) if fixed('period') is used instead.
On the next step, the NA was replaced with a valid value. In contrast to the above models, the functions now took longer to run, the number of rows was identified on the results, and the power determined was higher.
My sessionInfo is available at the end.
Question
Besides replacing or removing all the missing data within the predictors, could there be any other workarounds?
Hello,
I would like to report a possible bug, and I would be very grateful for advice to work around it. The issue regards missing data, and I have not found it addressed in the present way previously (#172, #143, #117, #71).
I've noticed that a minimal presence of NA cells (indeed, a single cell) has a large influence on
powerCurve()
andpowerSim()
. Please find a minimal, reproducible example below, using thecbpp
data set.For the examples with both functions, I firstly introduced one
NA
in a predictor (i.e.,period
). Then, I ran eitherpowerCurve
orpowerSim
using another predictor,fixed('incidence')
. In both cases, the results were extremely influenced by the NA cell, especially as the number of rows was unidentified. Please note that the results are similar (especially lacking the number of rows) iffixed('period')
is used instead.On the next step, the
NA
was replaced with a valid value. In contrast to the above models, the functions now took longer to run, the number of rows was identified on the results, and the power determined was higher.My
sessionInfo
is available at the end.Question
Besides replacing or removing all the missing data within the predictors, could there be any other workarounds?
Thank you very much for your attention
powerCurve
powerSim
Session info