jthaman / ciTools

An R Package for Quick Uncertainty Intervals
GNU General Public License v3.0
106 stars 9 forks source link

need response variable in new data for add_pi #45

Open nlichti opened 4 years ago

nlichti commented 4 years ago

Very useful package. I have a minor suggestion: in add_pi (and possibly other functions - I haven't checked), an error is thrown if tb does not include a column for the response variable. The actual values in the column are ignored, but the column has to be present. I'd guess this is due to some internal code similar to: X <- formula(fit) %>% model.matrix(data = tb) used to get the design matrix for simulation-based prediction intervals. Using a few more steps should eliminate the need to include the response. Something like: chr_formula <- formula(fit) %>% deparse() %>% strsplit(' ') %>% getElement(1) X <- as.formula(chr_formula[-1]) %>% model.matrix(data = tb) I noticed this specifically with a Poisson GLM. add_ci did not require the response to be in tb.

jthaman commented 4 years ago

Thanks for this note. I will take a look.

FlukeAndFeather commented 3 years ago

I think I might be running into a bug related to this. Here's a reprex:

library(tidyverse)
mod <- glm(mpg ~ disp, family = Gamma(), data = mtcars)
tibble(disp = seq(min(mtcars$disp), max(mtcars$disp), length.out = 10)) %>% 
    ciTools::add_pi(mod)
#> Error in model.frame.default(formula = mpg ~ disp, data = structure(list(: invalid type (list) for variable 'mpg'

Created on 2021-09-05 by the reprex package (v2.0.0)

Weirdly, no bug if I leave out the family argument:

library(tidyverse)
mod <- glm(mpg ~ disp, data = mtcars)
tibble(disp = seq(min(mtcars$disp), max(mtcars$disp), length.out = 10)) %>% 
    ciTools::add_pi(mod)
#>        disp     pred  LPB0.025 UPB0.975
#> 1   71.1000 26.66946 19.753419 33.58550
#> 2  115.6444 24.83356 17.999921 31.66719
#> 3  160.1889 22.99765 16.220266 29.77504
#> 4  204.7333 21.16175 14.413797 27.90969
#> 5  249.2778 19.32584 12.580164 26.07152
#> 6  293.8222 17.48994 10.719340 24.26053
#> 7  338.3667 15.65403  8.831623 22.47644
#> 8  382.9111 13.81813  6.917619 20.71864
#> 9  427.4556 11.98222  4.978206 18.98624
#> 10 472.0000 10.14632  3.014492 17.27814

Created on 2021-09-05 by the reprex package (v2.0.0)

akarlinsky commented 2 years ago

Ran into this bug as well.

akarlinsky commented 1 year ago

Anyone find a way around it? I can't estimate a PI due to this bug. I tried estimating the glm with y=TRUE to keep the dependent variable in the model object. I also tried creating a dependent variable column. No luck :(