pitakakariki / simr

Power Analysis of Generalised Linear Mixed Models by Simulation
68 stars 19 forks source link

Influence of one NA cell and available options #204

Open pablobernabeu opened 3 years ago

pablobernabeu commented 3 years ago

Hello,

I would like to report a possible bug, and I would be very grateful for advice to work around it. The issue regards missing data, and I have not found it addressed in the present way previously (#172, #143, #117, #71).

I've noticed that a minimal presence of NA cells (indeed, a single cell) has a large influence on powerCurve() and powerSim(). Please find a minimal, reproducible example below, using the cbpp data set.

For the examples with both functions, I firstly introduced one NA in a predictor (i.e., period). Then, I ran either powerCurve or powerSim using another predictor, fixed('incidence'). In both cases, the results were extremely influenced by the NA cell, especially as the number of rows was unidentified. Please note that the results are similar (especially lacking the number of rows) if fixed('period') is used instead.

On the next step, the NA was replaced with a valid value. In contrast to the above models, the functions now took longer to run, the number of rows was identified on the results, and the power determined was higher.

My sessionInfo is available at the end.

Question

Besides replacing or removing all the missing data within the predictors, could there be any other workarounds?

Thank you very much for your attention


powerCurve


> cbpp[1, 'period'] = NA
> 
> 
> cbpp
   herd incidence size period
1     1         2   14   <NA>
2     1         3   12      2
3     1         4    9      3
4     1         0    5      4
5     2         3   22      1
6     2         1   18      2
7     2         1   21      3
8     3         8   22      1
9     3         2   16      2
10    3         0   16      3
11    3         2   20      4
12    4         2   10      1
13    4         0   10      2
14    4         2    9      3
15    4         0    6      4
16    5         5   18      1
17    5         0   25      2
18    5         0   24      3
19    5         1    4      4
20    6         3   17      1
21    6         0   17      2
22    6         0   18      3
23    6         1   20      4
24    7         8   16      1
25    7         1   10      2
26    7         3    9      3
27    7         0    5      4
28    8        12   34      1
29    9         2    9      1
30    9         0    6      2
31    9         0    8      3
32    9         0    6      4
33   10         1   22      1
34   10         1   22      2
35   10         0   18      3
36   10         2   22      4
37   11         0   25      1
38   11         5   27      2
39   11         3   22      3
40   11         1   22      4
41   12         2   10      1
42   12         1    8      2
43   12         0    6      3
44   12         0    5      4
45   13         1   21      1
46   13         2   24      2
47   13         0   19      3
48   13         0   23      4
49   14        11   19      1
50   14         0    2      2
51   14         0    3      3
52   14         0    2      4
53   15         1   19      1
54   15         1   15      2
55   15         1   15      3
56   15         0   15      4
>
>
> gm1 <- glmer(size ~ period*incidence + (1 | herd),
+              data = cbpp, family = poisson
+ )
>
>
> test_powerCurve = powerCurve(gm1, fixed('incidence'), along = 'herd', nsim=100)
Calculating power at 10 sample sizes along herd
Warning message:ng: |=========================================================================================================================|
In observedPowerWarning(sim) :
  This appears to be an "observed power" calculation
> 
> 
> test_powerCurve
Power for predictor 'incidence', (95% confidence interval),
by number of levels in herd:
      3:  0.00% ( 0.00,  3.62) - 0 rows
      4:  0.00% ( 0.00,  3.62) - 0 rows
      6:  0.00% ( 0.00,  3.62) - 0 rows
      7:  0.00% ( 0.00,  3.62) - 0 rows
      8:  0.00% ( 0.00,  3.62) - 0 rows
     10:  0.00% ( 0.00,  3.62) - 0 rows
     11:  0.00% ( 0.00,  3.62) - 0 rows
     12:  0.00% ( 0.00,  3.62) - 0 rows
     14:  0.00% ( 0.00,  3.62) - 0 rows
     15:  0.00% ( 0.00,  3.62) - 0 rows

Time elapsed: 0 h 0 m 12 s

>
>
> # Now, replace the NA with a valid value
> cbpp[1, 'period'] = 2
> 
> 
> test_powerCurve = powerCurve(gm1, fixed('incidence'), along = 'herd', nsim=100)
Calculating power at 10 sample sizes along herd
( 1/10) Simulating: |                    ( 1/10) Simulating: |=                   ( 1/10) Simulating: |===                 ( 1/10) Simulating: |====                ( 1/10) Simulating: |======              ( 1/10) Simulating: |=======             ( 1/10) Simulating: |========            ( 1/10) Simulating: |==========          ( 1/10) Simulating: |===============     ( 1/10) Simulating: |================    ( 1/10) Simulating: |==================  ( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 2/10) Simulating: |=                   ( 2/10) Simulating: |====                ( 2/10) Simulating: |==========          ( 2/10) Simulating: |===============     ( 2/10) Simulating: |================    ( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 3/10) Simulating: |====================( 3/10) Simulating: |====================Warning message:ng: |=========================================================================================================================|
In observedPowerWarning(sim) :
  This appears to be an "observed power" calculation
> 
> 
> test_powerCurve
Power for predictor 'incidence', (95% confidence interval),
by number of levels in herd:
      3: 31.00% (22.13, 41.03) - 11 rows
      4: 34.00% (24.82, 44.15) - 15 rows
      6: 32.00% (23.02, 42.08) - 23 rows
      7: 44.00% (34.08, 54.28) - 27 rows
      8: 56.00% (45.72, 65.92) - 28 rows
     10: 71.00% (61.07, 79.64) - 36 rows
     11: 82.00% (73.05, 88.97) - 40 rows
     12: 82.00% (73.05, 88.97) - 44 rows
     14: 98.00% (92.96, 99.76) - 52 rows
     15: 98.00% (92.96, 99.76) - 56 rows

Time elapsed: 0 h 2 m 46 s

powerSim


> cbpp[1, 'period'] = NA
> 
> 
> cbpp
   herd incidence size period
1     1         2   14   <NA>
2     1         3   12      2
3     1         4    9      3
4     1         0    5      4
5     2         3   22      1
6     2         1   18      2
7     2         1   21      3
8     3         8   22      1
9     3         2   16      2
10    3         0   16      3
11    3         2   20      4
12    4         2   10      1
13    4         0   10      2
14    4         2    9      3
15    4         0    6      4
16    5         5   18      1
17    5         0   25      2
18    5         0   24      3
19    5         1    4      4
20    6         3   17      1
21    6         0   17      2
22    6         0   18      3
23    6         1   20      4
24    7         8   16      1
25    7         1   10      2
26    7         3    9      3
27    7         0    5      4
28    8        12   34      1
29    9         2    9      1
30    9         0    6      2
31    9         0    8      3
32    9         0    6      4
33   10         1   22      1
34   10         1   22      2
35   10         0   18      3
36   10         2   22      4
37   11         0   25      1
38   11         5   27      2
39   11         3   22      3
40   11         1   22      4
41   12         2   10      1
42   12         1    8      2
43   12         0    6      3
44   12         0    5      4
45   13         1   21      1
46   13         2   24      2
47   13         0   19      3
48   13         0   23      4
49   14        11   19      1
50   14         0    2      2
51   14         0    3      3
52   14         0    2      4
53   15         1   19      1
54   15         1   15      2
55   15         1   15      3
56   15         0   15      4
> 
> 
> test_powerSim = powerSim(gm1, fixed('incidence'), nsim=100)
Warning message:====================================================================================================================================================|
In observedPowerWarning(sim) :
  This appears to be an "observed power" calculation
> 
> 
> test_powerSim
Power for predictor 'incidence', (95% confidence interval):
       0.00% ( 0.00,  3.62)

Test: z-test
      Effect size for incidence is 0.079

Based on 100 simulations, (100 warnings, 100 errors)
alpha = 0.05, nrow = NA

Time elapsed: 0 h 0 m 4 s

nb: result might be an observed power calculation
> 
> 
> # Now, replace the NA with a valid value
> cbpp[1, 'period'] = 2
> 
> 
> test_powerSim = powerSim(gm1, fixed('incidence'), nsim=100)
Warning message:====================================================================================================================================================|
In observedPowerWarning(sim) :
  This appears to be an "observed power" calculation
> 
> 
> test_powerSim
Power for predictor 'incidence', (95% confidence interval):
      95.00% (88.72, 98.36)

Test: z-test
      Effect size for incidence is 0.079

Based on 100 simulations, (5 warnings, 0 errors)
alpha = 0.05, nrow = 56

Time elapsed: 0 h 0 m 24 s

nb: result might be an observed power calculation

Session info


> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] simr_1.0.5   lme4_1.1-26  Matrix_1.3-2

loaded via a namespace (and not attached):
 [1] statmod_1.4.35    tidyselect_1.1.0  purrr_0.3.4       splines_4.0.4     haven_2.3.1       lattice_0.20-41   carData_3.0-4     vctrs_0.3.6      
 [9] generics_0.1.0    mgcv_1.8-34       utf8_1.2.1        rlang_0.4.10      nloptr_1.2.2.2    pillar_1.5.1      foreign_0.8-81    glue_1.4.2       
[17] DBI_1.1.1         readxl_1.3.1      plyr_1.8.6        binom_1.1-1       lifecycle_1.0.0   stringr_1.4.0     cellranger_1.1.0  zip_2.1.1        
[25] rio_0.5.26        forcats_0.5.1     RLRsim_3.1-6      parallel_4.0.4    pbkrtest_0.5.1    curl_4.3          fansi_0.4.2       broom_0.7.5      
[33] Rcpp_1.0.6        backports_1.2.1   plotrix_3.8-1     abind_1.4-5       hms_1.0.0         stringi_1.5.3     openxlsx_4.2.3    dplyr_1.0.5      
[41] grid_4.0.4        tools_4.0.4       magrittr_2.0.1    tibble_3.1.0      tidyr_1.1.3       crayon_1.4.1      car_3.0-10        pkgconfig_2.0.3  
[49] MASS_7.3-53.1     ellipsis_0.3.1    data.table_1.14.0 assertthat_0.2.1  minqa_1.2.4       iterators_1.0.13  R6_2.5.0          boot_1.3-27      
[57] nlme_3.1-152      compiler_4.0.4   
pitakakariki commented 3 years ago

At this point na.omit is about all you can do. Thanks for the detailed report though, it will make it easier to fix when I do find the time.

pablobernabeu commented 3 years ago

Thank you -- that's alright.

Best regards