hanase / BMA

R package for Bayesian model averaging
34 stars 11 forks source link

Fails with missing data #5

Open Deleetdk opened 3 years ago

Deleetdk commented 3 years ago

The main function fails when there is missing data. The error message is not informative. I suggest you automatically subset to complete cases, as many other packages do. You can throw a warning when missing data was removed. Check out how the functions in rms package does this.

library(tidyverse)
library(BMA)
#> Loading required package: survival
#> Loading required package: leaps
#> Loading required package: robustbase
#> 
#> Attaching package: 'robustbase'
#> The following object is masked from 'package:survival':
#> 
#>     heart
#> Loading required package: inline
#> Loading required package: rrcov
#> Scalable Robust Estimators with High Breakdown Point (version 1.5-5)

#versions
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19.3
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] BMA_3.18.14       rrcov_1.5-5       inline_0.3.17     robustbase_0.93-7
#>  [5] leaps_3.1         survival_3.2-7    forcats_0.5.0     stringr_1.4.0    
#>  [9] dplyr_1.0.2       purrr_0.3.4       readr_1.4.0       tidyr_1.1.2      
#> [13] tibble_3.0.4      ggplot2_3.3.3     tidyverse_1.3.0  
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.5        lubridate_1.7.9.2 mvtnorm_1.1-1     lattice_0.20-41  
#>  [5] assertthat_0.2.1  digest_0.6.27     R6_2.5.0          cellranger_1.1.0 
#>  [9] backports_1.2.1   pcaPP_1.9-73      reprex_0.3.0.9001 stats4_4.0.3     
#> [13] evaluate_0.14     httr_1.4.2        highr_0.8         pillar_1.4.7     
#> [17] rlang_0.4.10      readxl_1.3.1      rstudioapi_0.13   Matrix_1.3-2     
#> [21] rmarkdown_2.6     splines_4.0.3     munsell_0.5.0     broom_0.5.6      
#> [25] compiler_4.0.3    modelr_0.1.8      xfun_0.20         pkgconfig_2.0.3  
#> [29] htmltools_0.5.1   tidyselect_1.1.0  fansi_0.4.2       crayon_1.3.4     
#> [33] dbplyr_2.0.0      withr_2.4.0       grid_4.0.3        nlme_3.1-151     
#> [37] jsonlite_1.7.2    gtable_0.3.0      lifecycle_0.2.0   DBI_1.1.0        
#> [41] magrittr_2.0.1    scales_1.1.1      cli_2.2.0         stringi_1.5.3    
#> [45] fs_1.5.0          xml2_1.3.2        ellipsis_0.3.1    generics_0.1.0   
#> [49] vctrs_0.3.6       tools_4.0.3       glue_1.4.2        DEoptimR_1.0-8   
#> [53] hms_0.5.3         yaml_2.2.1        colorspace_2.0-0  rvest_0.3.6      
#> [57] knitr_1.30        haven_2.3.1

#model formula
model_form = "displ ~ manufacturer + year + cyl + drv + cty + hwy + fl + class" %>% as.formula()

#fit OK
BMA::bic.glm(model_form, data = mpg, glm.family = "gaussian")
#> 
#> Call:
#> bic.glm.formula(f = model_form, data = mpg, glm.family = "gaussian")
#> 
#> 
#>  Posterior probabilities(%): 
#>  manufacturerchevrolet      manufacturerdodge       manufacturerford 
#>                  100.0                  100.0                   94.9 
#>      manufacturerhonda    manufacturerhyundai       manufacturerjeep 
#>                   12.4                   37.4                   53.1 
#> manufacturerland rover    manufacturerlincoln    manufacturermercury 
#>                    1.5                   33.4                   32.3 
#>     manufacturernissan    manufacturerpontiac     manufacturersubaru 
#>                   57.9                  100.0                   35.1 
#>     manufacturertoyota manufacturervolkswagen                   year 
#>                   53.1                   11.8                  100.0 
#>                   drvf                   drvr                    cty 
#>                    9.5                   94.2                   96.6 
#>                    hwy                    fld                    fle 
#>                   58.7                  100.0                  100.0 
#>                    flp                    flr           classcompact 
#>                    9.1                    7.3                   95.4 
#>           classmidsize           classminivan            classpickup 
#>                   95.4                  100.0                   95.4 
#>        classsubcompact               classsuv 
#>                   95.4                   95.4 
#> 
#>  Coefficient posterior expected values: 
#>            (Intercept)   manufacturerchevrolet       manufacturerdodge  
#>             -73.293637                0.842160                0.782625  
#>       manufacturerford       manufacturerhonda     manufacturerhyundai  
#>               0.580392                0.049055               -0.156565  
#>       manufacturerjeep  manufacturerland rover     manufacturerlincoln  
#>               0.334886                0.003003                0.305475  
#>    manufacturermercury      manufacturernissan     manufacturerpontiac  
#>               0.208277                0.299841                1.080805  
#>     manufacturersubaru      manufacturertoyota  manufacturervolkswagen  
#>              -0.131918                0.191814               -0.028317  
#>                   year                    drvf                    drvr  
#>               0.040527                0.033635                0.639857  
#>                    cty                     hwy                     fld  
#>              -0.135739               -0.047947                1.518490  
#>                    fle                     flp                     flr  
#>              -0.837974               -0.017056                0.012708  
#>           classcompact            classmidsize            classminivan  
#>              -1.571618               -1.446988               -2.026317  
#>            classpickup         classsubcompact                classsuv  
#>              -1.464263               -1.501324               -1.271144

#add missing data
mpg[1, 5] = NA

#fit fail
BMA::bic.glm(model_form, data = mpg, glm.family = "gaussian")
#> Error in model.frame.default(formula = y ~ ., data = x.df, weights = wt, : variable lengths differ (found for 'x.manufacturerchevrolet')

Created on 2021-01-18 by the reprex package (v0.3.0.9001)

hanase commented 2 years ago

Thank you for the suggestion @Deleetdk . For now I added an argument na.action to bic.glm.formula which is set to na.omit by default. Will do the same for the data.frame and matrix classes.