harrelfe / rms

Regression Modeling Strategies
https://hbiostat.org/R/rms
Other
176 stars 49 forks source link

predict() on an ols fit returns more unique values than expected #112

Open yhpua opened 2 years ago

yhpua commented 2 years ago

Dear Prof Harrell,

I would like to draw your attention to the observation that predict() on an ols fit returns more unique values than expected. However, predict() on an orm fit works fine. A working example is provided below.

Thank you in advance for looking into the matter.

Best wishes, Yonghao

# predict() on ols fit returns more unique values than expected 
ols_fit <- ols(drat ~ vs, data=mtcars)
unique(predict(ols_fit)) ## 4 unique values, expecting 2 
[1] 3.392 3.392 3.859 3.859
unique(fitted(ols_fit)) ## 4 unique values, expecting 2 
[1] 3.392 3.392 3.859 3.859

orm_fit <- orm(drat ~ vs, data=mtcars)
unique(predict(orm_fit)) ## 2 unique values
[1] -0.6356  1.1852

lm_fit <- lm(drat ~ vs, data=mtcars)
unique(predict(lm_fit)) ## 2 unique values
[1] 3.392 3.859
Deleetdk commented 2 years ago

The values are not the same:

> options(digits = 18)
> predict(ols_fit)
                  1                   2                   3                   4                   5                   6 
3.39222222222222314 3.39222222222222225 3.85928571428571443 3.85928571428571399 3.39222222222222225 3.85928571428571443 
                  7                   8                   9                  10                  11                  12 
3.39222222222222225 3.85928571428571443 3.85928571428571443 3.85928571428571443 3.85928571428571443 3.39222222222222225 
                 13                  14                  15                  16                  17                  18 
3.39222222222222225 3.39222222222222225 3.39222222222222225 3.39222222222222225 3.39222222222222225 3.85928571428571443 
                 19                  20                  21                  22                  23                  24 
3.85928571428571443 3.85928571428571399 3.85928571428571443 3.39222222222222225 3.39222222222222225 3.39222222222222225 
                 25                  26                  27                  28                  29                  30 
3.39222222222222225 3.85928571428571443 3.39222222222222225 3.85928571428571443 3.39222222222222225 3.39222222222222225 
                 31                  32 
3.39222222222222225 3.85928571428571399 
> unique(predict(ols_fit))
[1] 3.39222222222222314 3.39222222222222225 3.85928571428571443 3.85928571428571399

But they should be:

> mtcars$vs
 [1] 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1
> mtcars$vs %>% unique
[1] 0 1

It must be some numerical inaccuracy occurring at some low level. The model itself has:

> ols_fit
Linear Regression Model

 ols(formula = drat ~ vs, data = mtcars)

                                               Model Likelihood               Discrimination    
                                                     Ratio Test                      Indexes    
 Obs                    32    LR chi2       6.90000000000000036       R20.194000000000000006    
 sigma0.487999999999999989    d.f.                            1    R2 adj0.16700000000000001    
 d.f.                   30    Pr(> chi2) 0.00860000000000000001       g 0.236999999999999988    

 Residuals

      Min       1Q   Median       3Q      Max 
 -1.09929 -0.31472 -0.04929  0.23351  1.07071 

           Coef                 S.E.                 t                    Pr(>|t|)             
 Intercept 3.392199999999999882 0.115000000000000005 29.48999999999999844 <0.0001              
 vs        0.467100000000000015 0.173899999999999999  2.68999999999999995 0.0117000000000000003

So the values should be 3.392199999999999882+0.467100000000000015 and 3.392199999999999882. Curiously, this matches neither of the 4 results!