tidymodels / tidypredict

Run predictions inside the database
https://tidypredict.tidymodels.org
Other
257 stars 31 forks source link

glm prediction issue #73

Closed paullevchuk closed 3 years ago

paullevchuk commented 4 years ago

I tried to run prediction from glm model (from parsnip / workflow scheme). parse_model(glm_wflw_fitted_inter$fit$fit) %>% tidy() works fine, I got tibble with coefficients.

But when I tried to call tidypredict_fit() - it's failed:

Error: Only strings can be converted to symbols

Then I tried to test predictor names one by one...

This works: fit_glm <- glm(truth ~ ret_1d + ret_2d + ret_3d, data = data1, family = "binomial").

This does NOT work: fit_glm <- glm(truth ~ ret_1d + ret_2d + ret_3d + ret_1d_x_ret_1_24h_hits, data = data1, family = "binomial").

It seems that the issue with predictor names like this: ret_1d_x_ret_1_24h_hits.

P.S. interaction predictor names were generated by recipes::step_interact(terms = ~starts_with("ret"):starts_with("ret")).

saadaslam commented 4 years ago

I'm having a similar issue, and I believe this has something to do with variable names in the model being appended versions of other variables in the model:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidypredict)

model_data <- mtcars %>% 
  mutate(
   wt_sq = wt ^ 2
  )

model1 <- glm(am ~ wt, data = model_data, family = "binomial")

tidypredict_fit(model1)
#> 1 - 1/(1 + exp(12.0403696589627 + (wt * -4.0239699403279)))

model2 <- glm(am ~ wt + wt_sq, data = model_data, family = "binomial")

tidypredict_fit(model2)
#> Error: Only strings can be converted to symbols

Created on 2020-09-24 by the reprex package (v0.3.0)

So basically, the model with wt works, but the model with wt_sq doesn't.

When you run parse_model, and look at the fields object for each term, you'll see how col for the third term contains wt and wt_sq:

pm <- tidypredict::parse_model(model2)
purrr::map(pm$terms, "fields")
#> [[1]]
#> [[1]][[1]]
#> [[1]][[1]]$type
#> [1] "ordinary"
#> 
#> [[1]][[1]]$col
#> [1] "(Intercept)"
#> 
#> 
#> 
#> [[2]]
#> [[2]][[1]]
#> [[2]][[1]]$type
#> [1] "ordinary"
#> 
#> [[2]][[1]]$col
#> [1] "wt"
#> 
#> 
#> 
#> [[3]]
#> [[3]][[1]]
#> [[3]][[1]]$type
#> [1] "conditional"
#> 
#> [[3]][[1]]$col
#> [1] "wt"    "wt_sq"
#> 
#> [[3]][[1]]$val
#> [1] "_sq"
#> 
#> [[3]][[1]]$op
#> [1] "equal"

Created on 2020-09-24 by the reprex package (v0.3.0)

tidypredict_fit.glm maps through the fields term and runs rlang::sym which fails when a vector of length 2 is input. This makes sense, so I think the change needs to happen in parse_model.

github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.