tidymodels / discrim

Wrappers for discriminant analysis and naive Bayes models for use with the parsnip package
https://discrim.tidymodels.org
Other
28 stars 3 forks source link

Add model indicator encodings #10

Closed juliasilge closed 4 years ago

juliasilge commented 4 years ago

These parsnip-adjacent packages do need indicator encodings added to them, I believe. We could consider adding something in parsnip to set these as a default (probably "traditional"???) if there is nothing set in the adjacent package itself.

As of now, I have these indicator encodings set as:

library(discrim)
#> Loading required package: parsnip

c("naive_Bayes",
  "discrim_linear",
  "discrim_flexible",
  "discrim_regularized") %>%
  purrr::map_dfr(parsnip::get_encoding) %>%
  knitr::kable()
model engine mode predictor_indicators
naive_Bayes klaR classification none
naive_Bayes naivebayes classification none
discrim_linear MASS classification traditional
discrim_linear mda classification traditional
discrim_flexible earth classification traditional
discrim_regularized klaR classification traditional

Created on 2020-06-29 by the reprex package (v0.3.0.9001)

I don't believe naive Bayes needs indicator variables at all, based on how the underlying functions treat factors. However, this is a pretty big change from how discrim has treated these up until now, as evidenced by the tests. The tests use model.matrix() like this: https://github.com/tidymodels/discrim/blob/e306147232fe32351b1cc33ce248e031e6b8ccaa/tests/testthat/test-naive-Bayes.R#L24 Which makes traditional dummy variables. We would need to rewrite the tests for the naive Bayes models if we do in fact want to keep them as "none" for predictor indicators.

Thoughts @topepo?

juliasilge commented 4 years ago

This is ready to go with the new parsnip branch:

library(discrim)
#> Loading required package: parsnip

c("naive_Bayes",
  "discrim_linear",
  "discrim_flexible",
  "discrim_regularized") %>%
  purrr::map_dfr(parsnip::get_encoding) %>%
  knitr::kable()
model engine mode predictor_indicators compute_intercept remove_intercept
naive_Bayes klaR classification none FALSE FALSE
naive_Bayes naivebayes classification none FALSE FALSE
discrim_linear MASS classification traditional TRUE TRUE
discrim_linear mda classification traditional TRUE TRUE
discrim_flexible earth classification traditional TRUE TRUE
discrim_regularized klaR classification traditional TRUE TRUE

Created on 2020-06-30 by the reprex package (v0.3.0.9001)

github-actions[bot] commented 3 years ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.