hanase / BMA

R package for Bayesian model averaging
35 stars 11 forks source link

Error in backsolve(R, rinv): invalid 'k' argument #4

Closed Deleetdk closed 2 years ago

Deleetdk commented 3 years ago

I found an error. Not sure of the cause, but relates to one particular variable (model). Here's a reprex. I guess it may be due to handling factor variable with too many small levels.

library(tidyverse)
library(BMA)
#> Loading required package: survival
#> Loading required package: leaps
#> Loading required package: robustbase
#> 
#> Attaching package: 'robustbase'
#> The following object is masked from 'package:survival':
#> 
#>     heart
#> Loading required package: inline
#> Loading required package: rrcov
#> Scalable Robust Estimators with High Breakdown Point (version 1.5-5)

#versions
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19.3
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] BMA_3.18.14       rrcov_1.5-5       inline_0.3.17     robustbase_0.93-7
#>  [5] leaps_3.1         survival_3.2-7    forcats_0.5.0     stringr_1.4.0    
#>  [9] dplyr_1.0.2       purrr_0.3.4       readr_1.4.0       tidyr_1.1.2      
#> [13] tibble_3.0.4      ggplot2_3.3.3     tidyverse_1.3.0  
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.5        lubridate_1.7.9.2 mvtnorm_1.1-1     lattice_0.20-41  
#>  [5] assertthat_0.2.1  digest_0.6.27     R6_2.5.0          cellranger_1.1.0 
#>  [9] backports_1.2.1   pcaPP_1.9-73      reprex_0.3.0.9001 stats4_4.0.3     
#> [13] evaluate_0.14     httr_1.4.2        highr_0.8         pillar_1.4.7     
#> [17] rlang_0.4.10      readxl_1.3.1      rstudioapi_0.13   Matrix_1.3-2     
#> [21] rmarkdown_2.6     splines_4.0.3     munsell_0.5.0     broom_0.5.6      
#> [25] compiler_4.0.3    modelr_0.1.8      xfun_0.20         pkgconfig_2.0.3  
#> [29] htmltools_0.5.1   tidyselect_1.1.0  fansi_0.4.2       crayon_1.3.4     
#> [33] dbplyr_2.0.0      withr_2.4.0       grid_4.0.3        nlme_3.1-151     
#> [37] jsonlite_1.7.2    gtable_0.3.0      lifecycle_0.2.0   DBI_1.1.0        
#> [41] magrittr_2.0.1    scales_1.1.1      cli_2.2.0         stringi_1.5.3    
#> [45] fs_1.5.0          xml2_1.3.2        ellipsis_0.3.1    generics_0.1.0   
#> [49] vctrs_0.3.6       tools_4.0.3       glue_1.4.2        DEoptimR_1.0-8   
#> [53] hms_0.5.3         yaml_2.2.1        colorspace_2.0-0  rvest_0.3.6      
#> [57] knitr_1.30        haven_2.3.1

#change chr to fct
for (v in names(mpg)) {
  if (is.character(mpg[[v]])) mpg[[v]] = mpg[[v]] %>% as.factor()
}

#summarize variables
map(mpg, function(x) {
  if (is.numeric(x)) {
    return(psych::describe(x))
  } else {
    return(table(x))
  }
})
#> $manufacturer
#> x
#>       audi  chevrolet      dodge       ford      honda    hyundai       jeep 
#>         18         19         37         25          9         14          8 
#> land rover    lincoln    mercury     nissan    pontiac     subaru     toyota 
#>          4          3          4         13          5         14         34 
#> volkswagen 
#>         27 
#> 
#> $model
#> x
#>            4runner 4wd                     a4             a4 quattro 
#>                      6                      7                      8 
#>             a6 quattro                 altima     c1500 suburban 2wd 
#>                      3                      6                      5 
#>                  camry           camry solara            caravan 2wd 
#>                      7                      7                     11 
#>                  civic                corolla               corvette 
#>                      9                      5                      5 
#>      dakota pickup 4wd            durango 4wd         expedition 2wd 
#>                      9                      7                      3 
#>           explorer 4wd        f150 pickup 4wd           forester awd 
#>                      6                      7                      6 
#>     grand cherokee 4wd             grand prix                    gti 
#>                      8                      5                      5 
#>            impreza awd                  jetta        k1500 tahoe 4wd 
#>                      8                      9                      4 
#> land cruiser wagon 4wd                 malibu                 maxima 
#>                      2                      5                      3 
#>        mountaineer 4wd                mustang          navigator 2wd 
#>                      4                      9                      3 
#>             new beetle                 passat         pathfinder 4wd 
#>                      6                      7                      4 
#>    ram 1500 pickup 4wd            range rover                 sonata 
#>                     10                      4                      7 
#>                tiburon      toyota tacoma 4wd 
#>                      7                      7 
#> 
#> $displ
#>    vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
#> X1    1 234 3.47 1.29    3.3    3.39 1.33 1.6   7   5.4 0.44    -0.91 0.08
#> 
#> $year
#>    vars   n   mean   sd median trimmed  mad  min  max range skew kurtosis   se
#> X1    1 234 2003.5 4.51 2003.5  2003.5 6.67 1999 2008     9    0    -2.01 0.29
#> 
#> $cyl
#>    vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
#> X1    1 234 5.89 1.61      6    5.86 2.97   4   8     4 0.11    -1.46 0.11
#> 
#> $trans
#> x
#>   auto(av)   auto(l3)   auto(l4)   auto(l5)   auto(l6)   auto(s4)   auto(s5) 
#>          5          2         83         39          6          3          3 
#>   auto(s6) manual(m5) manual(m6) 
#>         16         58         19 
#> 
#> $drv
#> x
#>   4   f   r 
#> 103 106  25 
#> 
#> $cty
#>    vars   n  mean   sd median trimmed  mad min max range skew kurtosis   se
#> X1    1 234 16.86 4.26     17   16.61 4.45   9  35    26 0.79     1.43 0.28
#> 
#> $hwy
#>    vars   n  mean   sd median trimmed  mad min max range skew kurtosis   se
#> X1    1 234 23.44 5.95     24   23.23 7.41  12  44    32 0.36     0.14 0.39
#> 
#> $fl
#> x
#>   c   d   e   p   r 
#>   1   5   8  52 168 
#> 
#> $class
#> x
#>    2seater    compact    midsize    minivan     pickup subcompact        suv 
#>          5         47         41         11         33         35         62

#model formula
model_form = "displ ~ manufacturer + model + year + cyl + drv + cty + hwy + fl + class" %>% as.formula()

#fit
BMA_fit = BMA::bic.glm(model_form, data = mpg, glm.family = "gaussian")
#> Error in backsolve(R, rinv): invalid 'k' argument

#model formula
model_form = "displ ~ manufacturer + year + cyl + drv + cty + hwy + fl + class" %>% as.formula()

#fit
BMA_fit = BMA::bic.glm(model_form, data = mpg, glm.family = "gaussian")

Created on 2021-01-18 by the reprex package (v0.3.0.9001)

preshitambade commented 2 years ago

I am getting the same error for one of my analyses. Do you manage to resolve it? Or omitting the problematic variable is the only option?

Deleetdk commented 2 years ago

I am getting the same error for one of my analyses. Do you manage to resolve it? Or omitting the problematic variable is the only option?

I was only doing this dataset for testing purposes, I omitted it. https://rpubs.com/EmilOWK/BMA_examples

hanase commented 2 years ago

This error in most cases points to collinearity. In the case of the mpg dataset however, something else is going on, maybe related to the fact that the "model" variable has 38 levels. Even when running glm on the original dataset, 14 out of the 37 "model" related coefficients cannot be identified. So as in the case of collinearity, omitting the variable is the way to go.