therneau / survival

Survival package for R
382 stars 104 forks source link

Issue in how response and deviance residuals (from Gaussian fit) are calculated by survreg for interval-censored data. #183

Closed letitburn00 closed 2 years ago

letitburn00 commented 2 years ago

I am trying to run a series of interval regression on multiply imputed datasets using the survreg function. When pooling results, although estimates are returned, I get the following warning: log(1 - 2 * pnorm(width/2)) : NaNs produced. I posted my question on Cross Validated, and the person who answered me identified a problem in how deviance residuals (from Gaussian fit) are calculated for interval-censored data. They also raised concerns that there might be errors in how center values are calculated via rowMeans(y) (specifically, that an extra value of 3 is being added, rather than simply averaging the endpoints of the intervals). I wanted to bring these possible errors to your attention.

therneau commented 2 years ago

This will take some more thought. Your message tells me almost nothing, btw, but the crossvalidated link has good information. I gave up on the idea of deviance residuals for interval censored data a couple of decades ago, so I have to reconstruct all that thinking. And I'm on vacation this week...

letitburn00 commented 2 years ago

Sorry for not providing more detail in my message. Of course, no problem, there is no rush. Enjoy your vacation!

therneau commented 2 years ago

Found and fixed. It was a silly math error. We want the Gaussian probability from -width to width. The prob from 0 to width is pnorm(width) - .5, so the solution is 2(pnorm(width) -.5), which somehow became 1- 2pnorm(width). The crossvalidated commentator was correct about a rowsum oversight as well. FIxed in my master copy. I'm working on a CRAN release soon.