harrelfe / rms

Regression Modeling Strategies
https://hbiostat.org/R/rms
Other
175 stars 49 forks source link

6.0-1 breaks ols() sometimes for factors #89

Open Deleetdk opened 4 years ago

Deleetdk commented 4 years ago

I found to my surprise today that my code that worked before stopped working with this error:

> ols(SPI_g ~ UN_macroregion, data = std_sub %>% select(SPI_g, UN_macroregion) %>% miss_filter())
Error in chol2inv(fit$qr$qr) : 
  element (14, 14) is zero, so the inverse cannot be computed
> lm(SPI_g ~ UN_macroregion, data = std_sub %>% select(SPI_g, UN_macroregion) %>% miss_filter())
Call:
lm(formula = SPI_g ~ UN_macroregion, data = std_sub %>% select(SPI_g, 
    UN_macroregion) %>% miss_filter())

Coefficients:
                     (Intercept)           UN_macroregionCaribbean       UN_macroregionLatin America        UN_macroregionCentral Asia  
                           1.370                            -1.250                            -1.087                            -1.519  
            UN_macroregionAfrica        UN_macroregionEastern Asia      UN_macroregionEastern Europe           UN_macroregionMelanesia  
                          -2.442                            -0.975                            -0.659                            -2.083  
              UN_macroregionMENA           UN_macroregionPolynesia  UN_macroregionSouth-Eastern Asia       UN_macroregionSouthern Asia  
                          -1.277                            -1.329                            -1.401                            -1.735  
   UN_macroregionSouthern Europe  
                          -0.497  

So normal R version of the same model works just fine, and rolling rms back to 6.0-0 fixes the issue.

I had a bunch of related models that don't have the UN_macroregion predictor, and it is only this predictor that causes the bug, and it is the only factor.

The data needed to run the above is attached.

data.csv.gz

harrelfe commented 3 years ago

Send a minimal, trivial example that fails that use simulated data.

jessexknight commented 3 years ago

This happened to me too. I don't have time now to investigate / produce a MWE but rolling back to 6.0-0 also fixes for me.

Deleetdk commented 3 years ago

I couldn't figure out how to make a trivial example using e.g. iris dataset. Must be something more complicated, redundant factor levels or something.

markseeto commented 2 years ago

@harrelfe An example is shown below. The problem appears to happen when the model includes a factor variable and the data set passed to ols() doesn't include all the factor levels.

library(rms)

set.seed(1)

d <- data.frame(x = factor(rep(c("group1", "group2", "group3"), each = 10)),
                y = rnorm(30))

lm(y ~ x, data = d[1:15, ])

# Call:
# lm(formula = y ~ x, data = d[1:15, ])

# Coefficients:
# (Intercept)      xgroup2
#     0.13220     -0.09408

ols(y ~ x, data = d[1:15, ])

# Error in chol2inv(fit$qr$qr) :
#   element (3, 3) is zero, so the inverse cannot be computed

ols(y ~ x, data = d[1:25, ])  # this works

sessionInfo()

# R version 4.2.1 (2022-06-23 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 19044)

# Matrix products: default

# locale:
# [1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8    LC_MONETARY=English_Australia.utf8
# [4] LC_NUMERIC=C                       LC_TIME=English_Australia.utf8

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base

# other attached packages:
# [1] rms_6.3-0       SparseM_1.81    Hmisc_4.7-0     ggplot2_3.3.6   Formula_1.2-4   survival_3.3-1
# [7] lattice_0.20-45

# loaded via a namespace (and not attached):
#  [1] zoo_1.8-10          tidyselect_1.1.2    xfun_0.31           purrr_0.3.4         splines_4.2.1
#  [6] colorspace_2.0-3    vctrs_0.4.1         generics_0.1.2      htmltools_0.5.2     base64enc_0.1-3
# [11] utf8_1.2.2          rlang_1.0.3         pillar_1.7.0        foreign_0.8-82      glue_1.6.2
# [16] withr_2.5.0         DBI_1.1.3           RColorBrewer_1.1-3  multcomp_1.4-19     jpeg_0.1-9
# [21] lifecycle_1.0.1     stringr_1.4.0       MatrixModels_0.5-0  munsell_0.5.0       gtable_0.3.0
# [26] mvtnorm_1.1-3       htmlwidgets_1.5.4   codetools_0.2-18    latticeExtra_0.6-29 knitr_1.39
# [31] fastmap_1.1.0       quantreg_5.93       fansi_1.0.3         htmlTable_2.4.0     TH.data_1.1-1
# [36] scales_1.2.0        backports_1.4.1     checkmate_2.1.0     gridExtra_2.3       png_0.1-7
# [41] digest_0.6.29       polspline_1.1.20    stringi_1.7.6       dplyr_1.0.9         grid_4.2.1
# [46] cli_3.3.0           tools_4.2.1         sandwich_3.0-2      magrittr_2.0.3      tibble_3.1.7
# [51] cluster_2.1.3       crayon_1.5.1        pkgconfig_2.0.3     MASS_7.3-57         ellipsis_0.3.2
# [56] Matrix_1.4-1        data.table_1.14.2   assertthat_0.2.1    rstudioapi_0.13     R6_2.5.1
# [61] rpart_4.1.16        nlme_3.1-158        nnet_7.3-17         compiler_4.2.1
JMLuther commented 1 year ago

I also get this error when running on factor with unused levels. Redefining the factor without the unused level fixes the error.

rms 6.3-0 R 4.2.2