pat-s / oddsratio

Simplified odds ratio calculation of binomial GAM/GLM models
https://pat-s.github.io/oddsratio
Other
32 stars 2 forks source link

error when variable names have partial matching #34

Closed raff-k closed 4 years ago

raff-k commented 5 years ago

Hello, I would like to use your functions to display GAMs smoothing function. Some of my variables have partial matching variable names such as "slp" and "slp_catch". It seems, that this causes an error inside your function. Here a reproducible example.

# load libraries
library(oddsratio, mgcv)

# get data
dat <- oddsratio::data_gam
dat.test <- dat

# rename variables x0 and x1 
names(dat.test)[2:3] <- c("x_0", "x_01")

# fit models
fit_gam <- mgcv::gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + offset(x3) + x4, 
                     data = dat)

fit_gam_test <- mgcv::gam(y ~ s(x_0) + s(I(x_01^2)) + s(x2) + offset(x3) + x4, 
                     data = dat.test)

# check oddsratio function
oddsratio::plot_gam(model = fit_gam, pred = "x2") #  is working
oddsratio::plot_gam(model = fit_gam_test, pred = "x2") #  is working

oddsratio::plot_gam(model = fit_gam_test, pred = "x_0") # gives error
# Error: $ operator is invalid for atomic vectors

# Debugging:
# ... looking into function gam_to_df
debug(gam_to_df)

# ...
# the line: set_pred <- which(grepl(pred, plot_df)) 
# ... gives 2 items, which then causes the error. 
raff-k commented 5 years ago

I am not a great expert of the regular expressions. But it seems that also the "_" in the variable name makes it complicate to adapt the grep-pattern... The following function works, but maybe it's a bit messy:

# the following command is now working
plot_gam(model = fit_gam_test, pred = "^x_0$")

# with...
gam_to_df <- function (model = NULL, pred = NULL) 
{
  plot_df <- no_plot(model)

  # grep all names first
  plot_df_names <- sapply(plot_df, function(x) x$xlab) %>% 
                   gsub(pattern = "_", replacement = "", x = .)

  # gsub possible bad characters in names (could be extended when further error occurs)
  pred <- gsub(pattern = "_", replacement = "", x = pred)

  set_pred <- which(grepl(pred, plot_df_names)) # switch plot_df with plot_df_names
  df <- data.frame(x = plot_df[[set_pred]]$x, se_upr = plot_df[[set_pred]]$fit + 
                     plot_df[[set_pred]]$se, se_lwr = plot_df[[set_pred]]$fit - 
                     plot_df[[set_pred]]$se, y = plot_df[[set_pred]]$fit)
  return(df)
}
pat-s commented 5 years ago

Hi @raff-k, thanks for reporting.

If its just a grepl problem returning two lines, it should be sufficient to make the grepl call more robust. I don't think we need such a "big" function here.

Do you have time to do this?

pat-s commented 4 years ago

@raff-k

Note that library(oddsratio, mgcv) is not valid code, i.e. {mgcv} is not being loaded in this case.