rsquaredacademy / olsrr

Tools for developing OLS regression models
https://olsrr.rsquaredacademy.com/
Other
102 stars 22 forks source link

If ols_test_outlier() does not find any outliers, it returns largest positive residual instead of largest absolute residual #177

Closed iankelk closed 3 years ago

iankelk commented 3 years ago

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


Brief description of the problem:

If ols_test_outlier() does not find an outlier, it should instead display the largest studentized residual. However, it does not conduct the search for the largest absolute value of the studentized residual, but only displays the largest positive value. In the code located here: https://github.com/rsquaredacademy/olsrr/blob/master/R/ols-outlier-test.R I found the problematic line.

Problem

Because this line only uses the max() function, it doesn't account for larger negative residuals: 31: out <- data_bon[data_bon$stud_resid == max(data_bon$stud_resid), ]

Fix 1

If I change this line to use abs() on both sides of the comparison, it works.

31: out <- data_bon[abs(data_bon$stud_resid) == max(abs(data_bon$stud_resid)), ]

Fix 2

An even nicer looking fix uses the which.max() function:

31: out <- data_bon[which.max(abs(data_bon$stud_resid)), ]

library(faraway)
library(olsrr)
#> Registered S3 methods overwritten by 'car':
#>   method                          from
#>   influence.merMod                lme4
#>   cooks.distance.influence.merMod lme4
#>   dfbeta.influence.merMod         lme4
#>   dfbetas.influence.merMod        lme4
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:faraway':
#> 
#>     hsb
#> The following object is masked from 'package:datasets':
#> 
#>     rivers
f<-lm(stack.loss ~ ., data=stackloss)
## No outliers with p-value less than Bonferroni, should display 
## largest residual, but only finds largest positive value
ols_test_outlier(f)
#>   studentized_residual unadjusted_p_val bonferroni_p_val
#> 4             2.051797        0.0569287         1.195503
## Residual at index found is largest positive residual
rstudent(f)[4]
#>        4 
#> 2.051797
## Actual maximum absolute value of residual is at index=21
index <- which.max(abs(rstudent(f)))
index
#> 21 
#> 21
## Actual largest studentized residual
rstudent(f)[index]
#>        21 
#> -3.330493

Created on 2021-04-17 by the reprex package (v1.0.0)