harrelfe / rms

Regression Modeling Strategies
https://hbiostat.org/R/rms
Other
172 stars 48 forks source link

`poma` fails for response variables with >= 10 levels coded as a factor #115

Closed krassowski closed 2 years ago

krassowski commented 2 years ago

While both the examples given in the documentation work ok for numeric responses:

# < 10 unique levels
mod.orm <- orm(carb ~ cyl + hp, x=TRUE, y=TRUE, data=mtcars)
poma(mod.orm)
# >= 10 unique levels
mod.orm <- orm(mpg ~ cyl + hp, x=TRUE, y=TRUE, data = mtcars)
poma(mod.orm)

If we recode the outcomes to a factor:

mtcars_factor = data.frame(mtcars)
mtcars_factor$carb = factor(mtcars_factor$carb, ordered=TRUE)
mtcars_factor$mpg = factor(mtcars_factor$mpg, ordered=TRUE)

The code path for >= 10 unique levels no longer works:

mod.orm <- orm(mpg ~ cyl + hp, x=TRUE, y=TRUE, data = mtcars_factor)
poma(mod.orm)

It raises:

Error in quantile.default(mod.orm$y, 0.1, na.rm = TRUE) : 
  'type' must be 1 or 3 for ordered factors

This comes from ECDF plot limits specification:

https://github.com/harrelfe/rms/blob/00bb6d335e55d01a8c7643de6d57077f36830be5/R/poma.r#L151

The error can be reproduced in isolation with:

quantile(factor(mtcars$mpg, ordered=TRUE), 0.10, na.rm=TRUE)

This can be fixed by converting the factor back to numeric for the sake of quantile computation:

f = factor(mtcars$mpg, ordered=TRUE)
quantile(as.numeric(levels(f))[f], 0.10, na.rm=TRUE)