jacob-long / jtools

Tools for summarizing/visualizing regressions and other helpful stuff
https://jtools.jacob-long.com
GNU General Public License v3.0
165 stars 22 forks source link

summ error with start values in glm #63

Closed ngreifer closed 5 years ago

ngreifer commented 5 years ago

Sorry to be giving you more to work on. summ gives a bad error when staring values are supplied to glm(). This is required when using glm with some links. It seems to require start values of length 1, even when there are two parameters in the model. summary() works fine, though. Setting only one start value (appropriately) yields an error in glm.

data("mpg", package = "ggplot2")
fit <- glm(cty ~ cyl, data = mpg, start = c(1,1))
jtools::summ(fit)
#> Error in glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : length of 'start' should equal 1 and correspond to initial coefs for "(Intercept)"
summary(fit) #works
#> 
#> Call:
#> glm(formula = cty ~ cyl, data = mpg, start = c(1, 1))
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -5.8785  -1.6225   0.1215   1.3775  14.1215  
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  29.3904     0.6268   46.89   <2e-16 ***
#> cyl          -2.1280     0.1027  -20.72   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for gaussian family taken to be 6.380225)
#> 
#>     Null deviance: 4220.3  on 233  degrees of freedom
#> Residual deviance: 1480.2  on 232  degrees of freedom
#> AIC: 1101.7
#> 
#> Number of Fisher Scoring iterations: 2
fit <- glm(cty ~ cyl, data = mpg, start = c(1))
#> Error in glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : length of 'start' should equal 2 and correspond to initial coefs for c("(Intercept)", "cyl")

Created on 2019-05-31 by the reprex package (v0.3.0)

jacob-long commented 5 years ago

Hmm, so it's failing in the calculation of the pseudo-R2.

Doing so involves fitting a null model (intercept-only), so doing something like this:

update(fit, formula = . ~ 1)

Which means the null model is getting fed a start argument that is too long. I guess the main question is whether it is right to just keep the first element of start, or drop any start argument altogether, or something else when fitting the null model.

ngreifer commented 5 years ago

Eh just because it's so simple you could just avoid a start value since all it's doing is computing the mean, or you could compute the mean and hand it off as a start value. I wouldn't worry too much about it though. I ran into this because some models require start values.