Is the implementation of the Hosmer Lemeshow test correct?

indenkun commented 1 year ago

Thank you very much for your very useful package.

By the way, for hoslem.test(), which is the Hosmer Lemeshow test, a goodness of fit test for multiple logistic regression analysis, I get the following results when I run it as follows;

data("Titanic")
df <- data.frame(Titanic)
df <- data.frame(Class = rep(df$Class, df$Freq),
                 Sex = rep(df$Sex, df$Freq),
                 Age = rep(df$Age, df$Freq),
                 Survived = rep(df$Survived, df$Freq))
model <- glm(Survived ~ . ,data = df, family = binomial())

library(ResourceSelection)

## ResourceSelection 0.3-5   2019-07-22

HL <- hoslem.test(model$y, model$fitted.values, g = 10)
HL

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  model$y, model$fitted.values
## X-squared = 16.733, df = 8, p-value = 0.03301

However, when I check the subgroups, I see the following;

HL$observed

##                
## cutyhat           y0   y1
##   [0.104,0.225] 1211  281
##   (0.225,0.407]  153   70
##   (0.407,0.566]   89   87
##   (0.566,0.736]   13   85
##   (0.736,0.957]   24  188

HL$expected

##                
## cutyhat              yhat0      yhat1
##   [0.104,0.225] 1216.20514  275.79486
##   (0.225,0.407]  139.71270   83.28730
##   (0.407,0.566]   77.99548   98.00452
##   (0.566,0.736]   26.21904   71.78096
##   (0.736,0.957]   29.86764  182.13236

This means that the number of subgroups is 5 instead of the intended 10. However, the p-value is calculated with 8 degrees of freedom, i.e., the intended number of subgroups - 2.

I think this is due to the following where hoslem.test() determines the range of predictions for each subgroup;

qq <- unique(quantile(yhat, probs=seq(0, 1, 1/g)))

Because of the unique() here, the number of subgroups may be less than intended, but the degree of freedom to compute p-values remains the same. This is no warning to the user and appears to be calculated correctly because the calculation is complete.

Is this implementation of the Hosmer-Lemeshow Test correct?

I’m not sure about the stats, so sorry if I’m pointing out the wrong thing.

psolymos commented 1 year ago

Hi @indenkun you might be right, that df is still g-2, but when qq leads to fewer bins due to quantiles being equal, g should be adjusted.

hoslem.test <- function (x, y, g = 10) {
    DNAME <- paste(deparse(substitute(x)), deparse(substitute(y)), 
        sep = ", ")
    METHOD <- "Hosmer and Lemeshow goodness of fit (GOF) test"
    yhat <- y
    y <- x
    qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g)))
    cutyhat <- cut(yhat, breaks = qq, include.lowest = TRUE)
    observed <- xtabs(cbind(y0 = 1 - y, y1 = y) ~ cutyhat)
    expected <- xtabs(cbind(yhat0 = 1 - yhat, yhat1 = yhat) ~ 
        cutyhat)
    chisq <- sum((observed - expected)^2/expected)
    G <- length(unique(cutyhat))
    PVAL = 1 - pchisq(chisq, G - 2)
    PARAMETER <- G - 2
    names(chisq) <- "X-squared"
    names(PARAMETER) <- "df"
    structure(list(statistic = chisq, parameter = PARAMETER, 
        p.value = PVAL, method = METHOD, data.name = DNAME, observed = observed, 
        expected = expected), class = "htest")
}

indenkun commented 1 year ago

@psolymos

Thank you for your reply and fix.

Perhaps it is none of my business, but wouldn’t it cause confusion on the part of the user if this change in the number of quantiles (change in degrees of freedom) were to occur without warning?

Of course, a thorough understanding of the Hosmer-Lemeshow Test would immediately indicate that the number of subgroups may have changed, but considering the fact that this has not been an issue for many years, it seems that the Hosmer-Lemeshow Test is not well understood and the user may not be aware of it. I think it is possible that hoslem.test() is being used without a good understanding of the Hosmer-Lemeshow Test.

Therefore, I think it would be better to output a warning message when g and G are different, as shown below, to prevent confusion on the part of the user.

hoslem.test <- function (x, y, g = 10) {
  DNAME <- paste(deparse(substitute(x)), deparse(substitute(y)), 
                 sep = ", ")
  METHOD <- "Hosmer and Lemeshow goodness of fit (GOF) test"
  yhat <- y
  y <- x
  qq <- unique(quantile(yhat, probs = seq(0, 1, 1/g)))
  cutyhat <- cut(yhat, breaks = qq, include.lowest = TRUE)
  observed <- xtabs(cbind(y0 = 1 - y, y1 = y) ~ cutyhat)
  expected <- xtabs(cbind(yhat0 = 1 - yhat, yhat1 = yhat) ~ 
                      cutyhat)
  chisq <- sum((observed - expected)^2/expected)
  G <- length(unique(cutyhat))
  PVAL = 1 - pchisq(chisq, G - 2)
  PARAMETER <- G - 2
  names(chisq) <- "X-squared"
  names(PARAMETER) <- "df"
  if(g != G) 
    warning("The number of bins has been changed because it could not be calculated with the specified bins.")
  structure(list(statistic = chisq, parameter = PARAMETER, 
                 p.value = PVAL, method = METHOD, data.name = DNAME, observed = observed, 
                 expected = expected), class = "htest")
}

data("Titanic")
df <- data.frame(Titanic)
df <- data.frame(Class = rep(df$Class, df$Freq),
                 Sex = rep(df$Sex, df$Freq),
                 Age = rep(df$Age, df$Freq),
                 Survived = rep(df$Survived, df$Freq))
model <- glm(Survived ~ . ,data = df, family = binomial())
# Warning message appears because the bins number has changed.
hoslem.test(model$y, model$fitted.values, g = 10)

## Warning in hoslem.test(model$y, model$fitted.values, g = 10): The number of
## bins has been changed because it could not be calculated with the specified
## bins.

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  model$y, model$fitted.values
## X-squared = 16.733, df = 3, p-value = 0.0008019

# No warning message because there is no particular change.
model <- glm(vs ~ wt + mpg, data = mtcars, family = binomial())
hoslem.test(model$y, model$fitted.values, g = 10)

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  model$y, model$fitted.values
## X-squared = 5.1369, df = 8, p-value = 0.7429

I would appreciate your consideration.

psolymos commented 1 year ago

I like the idea of a warning as you described. There also should be an error when G < 2 (df can only be non-negative).

indenkun commented 1 year ago

Thank you for your further fixes. I think this fix makes the function more user-friendly.

psolymos / ResourceSelection

Is the implementation of the Hosmer Lemeshow test correct? #18