harrelfe / rms

Regression Modeling Strategies
https://hbiostat.org/R/rms
Other
172 stars 48 forks source link

Trouble with definition of "D" in `val.prob`. #146

Open jthaman opened 4 months ago

jthaman commented 4 months ago

I'm trying to understand what "D" means in the val.prob function. I read in your technical report that.

" The index of discrimination is derived by computing the difference in quality of the best constant predictor (one that on the average correctly predicts the overall prevalence of the event) and the best calibrated predictor:

D = [L(a, 0) - L (a,b) - 1] / n".

I have tried to code this up myself in R, but i can't match your output from val.prob.

Here is an example that i coded up :

# example data
y.test <- c(0, 1, 0, 0, 1, 1)
p.sim <- c(0.5, 0.8, 0.1, 0.2, 0.4, 0.7)
n <- length(y.test)

cal.fit <- glm(y.test ~ qlogis(p.sim), family = binomial)
Lab <- -2 * logLik(cal.fit); Lab

cal.fit01 <- glm(y.test ~ 0, data = dat, family = binomial, offset = qlogis(p.sim))
L01 <- -2 * logLik(cal.fit01); L01

cal.fita0 <- glm(y.test ~ 1, data = dat, family = binomial)
La0 <- -2 * logLik(cal.fita0); La0

cal.fita1 <- glm(y.test ~ 1, data = dat, family = binomial, offset = qlogis(p.sim))
La1 <- -2 * logLik(cal.fita1); La1

U <- (L01 - Lab - 2) / n; U # matches val.prob
Up <- (L01 - La1 - 1) / n; Up # not used in val.prob
Us <- (La1 - Lab - 1) / n; Us # not used in val.prob

D.paper <- (La0 - Lab - 1) / n; D.paper # does not match
D.code <- (La0 - L01 - 1) / n; D.code # matches val.prob

val.prob(p.sim, y.test)

D.code is D as best I can tell from reading the source code of val.prob.s, which is (I believe) that same as [L(a, 0) - L (0, 1) - 1] / n". D.paper is the D in the technical report. So, which D is the real D? Thanks! i think this stuff is so cool.