vincentarelbundock / modelsummary

Beautiful and customizable model summaries in R.
http://modelsummary.com
Other
912 stars 75 forks source link

multinom with response as frequency matrix (not factor) #672

Closed zeileis closed 1 year ago

zeileis commented 1 year ago

Description: When using modelsummary(..., shape = response + term ~ model) on multinom() output I get an error that response is not found. Possibly this is because get_estimates() does not recognize all response labels. When I fit the equivalent model with a factor response, everything works as expected.

Reproducible example: Using the cns data from the faraway package I want to model the type of CNS malformation (Anencephalus, Spina bifida, Other) by water hardness and type of work of the parents. The data is tabulated in wide format with the columns An, Sp, and Other giving the numbers of cases.

library("modelsummary")
library("nnet")
data("cns", package = "faraway")
m <- multinom(cbind(An, Sp, Other) ~ Water + Work, data = cns)
coef(m)
##       (Intercept)     Water WorkNonManual
## Sp         0.3752 -0.001297        0.1158
## Other     -1.1225  0.002183       -0.2703

Running modelsummary() with the shape specification yields an error:

modelsummary(m, shape = response + term ~ model)
## Error: Group columns (response) were not found in the extracted data. The
##   "group" argument must be a column name in the data.frame produced by
##   `get_estimates(model)`.  [...]

This is despite response being a column name in:

get_estimates(m)
##            term  estimate std.error conf.level  conf.low conf.high response
## 1   (Intercept)  0.375202  0.190033       0.95 -0.048218  0.798622       Sp
## 2         Water -0.001297  0.002033       0.95 -0.005827  0.003233       Sp
## 3 WorkNonManual  0.115756  0.208688       0.95 -0.349229  0.580740       Sp
## 4   (Intercept) -1.122550  0.190033       0.95 -0.048218  0.798622       Sp
## 5         Water  0.002183  0.002033       0.95 -0.005827  0.003233       Sp
## 6 WorkNonManual -0.270279  0.208688       0.95 -0.349229  0.580740       Sp
##   statistic df.error p.value response.1 s.value group
## 1    1.9744       10 0.07658         Sp     3.7      
## 2   -0.6380       10 0.53780         Sp     0.9      
## 3    0.5547       10 0.59130         Sp     0.8      
## 4    1.9744       10 0.07658         Sp     3.7      
## 5   -0.6380       10 0.53780         Sp     0.9      
## 6    0.5547       10 0.59130         Sp     0.8      

Note, however, that response incorrectly lists Sp for all six rows rather than Other for the last three rows.

If I reshape the data first to long format with a factor response then I can fit an equivalent model for which everything works smoothly:

cns2 <- reshape(cns, direction = "long", timevar = "Type", times = names(cns)[3:5], varying = 3:5, v.names = "Freq")[, 3:6]
cns2$Type <- factor(cns2$Type, levels = unique(cns2$Type))
m2 <- multinom(Type ~ Water + Work, data = cns2, weights = Freq)
coef(m2)
##       (Intercept)     Water WorkNonManual
## Sp         0.3752 -0.001297        0.1158
## Other     -1.1225  0.002183       -0.2703
modelsummary(m2, shape = response + term ~ model)

With previous versions of modelsummary I was able to obtain equivalent model summaries for both m and m2. But with the current CRAN or devel version only the latter works.

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Debian GNU/Linux trixie/sid
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Vienna
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] nnet_7.3-19             modelsummary_1.4.2.9006
## 
## loaded via a namespace (and not attached):
##  [1] gt_0.9.0             sass_0.4.7           utf8_1.2.3          
##  [4] future_1.33.0        generics_0.1.3       xml2_1.3.5          
##  [7] fortunes_1.5-4       stringi_1.7.12       listenv_0.9.0       
## [10] digest_0.6.33        magrittr_2.0.3       evaluate_0.21       
## [13] fastmap_1.1.1        jsonlite_1.8.7       backports_1.4.1     
## [16] httr_1.4.7           rvest_1.0.3          fansi_1.0.4         
## [19] viridisLite_0.4.2    scales_1.2.1         jquerylib_0.1.4     
## [22] codetools_0.2-19     cli_3.6.1            rlang_1.1.1         
## [25] performance_0.10.5.4 parallelly_1.36.0    future.apply_1.11.0 
## [28] munsell_0.5.0        cachem_1.0.8         datawizard_0.9.0    
## [31] tools_4.3.1          parallel_4.3.1       checkmate_2.2.0     
## [34] dplyr_1.1.2          colorspace_2.1-0     DT_0.28             
## [37] webshot_0.5.5        bayestestR_0.13.1    globals_0.16.2      
## [40] kableExtra_1.3.4     vctrs_0.6.3          R6_2.5.1            
## [43] lifecycle_1.0.3      stringr_1.5.0        htmlwidgets_1.6.2   
## [46] insight_0.19.5.6     pkgconfig_2.0.3      bslib_0.5.1         
## [49] pillar_1.9.0         data.table_1.14.8    glue_1.6.2          
## [52] systemfonts_1.0.4    xfun_0.40            tibble_3.2.1        
## [55] tidyselect_1.2.0     rstudioapi_0.15.0    parameters_0.21.2.3 
## [58] knitr_1.43           htmltools_0.5.6      tables_0.9.17       
## [61] rmarkdown_2.24       svglite_2.1.1        compiler_4.3.1      
vincentarelbundock commented 1 year ago

Thanks for the report @zeileis

This is an upstream issue in the parameters package. I probably won't have time to look at this in the short run, but my guess is it's going to be an easy fix. I opened an issue on the parameters repo: https://github.com/easystats/parameters/issues/905

Once we/they merge a fix upstream, modelsummary should work as intended automatically.

zeileis commented 1 year ago

That's what I suspected but I wasn't sure. I really need to take some time to look at the easystats packages in more detail...

Thanks for posting the issue upstream.

strengejacke commented 1 year ago

Do both models return z-statistic or t-statistic?

vincentarelbundock commented 1 year ago

Neither. The summary(model) only prints estimates and standard errors.

strengejacke commented 1 year ago

Ok, put differently: are potential test statistics z or t? :-) Looking at confint(), it seems to be z.

zeileis commented 1 year ago

z is more commonly used in practice, I would say. That's what coeftest() returns by default.

strengejacke commented 1 year ago

Thanks! Background of my question was that your examples in https://github.com/easystats/parameters/issues/905 returned different CI for the same models. Now the results are in line with confint().

zeileis commented 1 year ago

Ah, well spotted, thanks for the quick fix!

vincentarelbundock commented 1 year ago

Ah, well spotted, thanks for the quick fix!

@zeileis, this issue was a nice introduction to @strengejacke. He is amazing.

strengejacke commented 1 year ago

Ok, once parameters 0.21.2.5 is on r-universe, you could run easystats::install_latest() and it should work.

library("modelsummary")
library("nnet")
data("cns", package = "faraway")
m <- multinom(cbind(An, Sp, Other) ~ Water + Work, data = cns)
#> # weights:  12 (6 variable)
#> initial  value 762.436928 
#> iter  10 value 685.762336
#> final  value 685.762238 
#> converged
modelsummary(m, shape = response + term ~ model)
response (1)
Sp (Intercept) 0.375
Sp (0.190)
Sp Water -0.001
Sp (0.002)
Sp WorkNonManual 0.116
Sp (0.209)
Other (Intercept) -1.123
Other (0.280)
Other Water 0.002
Other (0.003)
Other WorkNonManual -0.270
Other (0.325)
Num.Obs. 16
R2 0.002
R2 Adj. 0.001
AIC 1383.5
BIC 1388.2
RMSE 0.11

Created on 2023-09-29 with reprex v2.0.2

vincentarelbundock commented 1 year ago

Love it!

zeileis commented 1 year ago

Hooray, thank you so much. I successfully re-compiled the multinomial chapter of my "Microeconometrics" lecture notes now: https://discdown.org/microeconometrics/multinomial-response-models.html#malformations-of-the-cns

Daniel @strengejacke nice to meet you! I've been watching your work for some time and the easystats packages are on my list to learn better in my sabbatical next semester.

Vincent @vincentarelbundock thanks for making the introduction.