tidymodels / discrim

Wrappers for discriminant analysis and naive Bayes models for use with the parsnip package
https://discrim.tidymodels.org
Other
28 stars 3 forks source link

allow discrim_linear() to specify the prior probability of each class #48

Closed jmarshallnz closed 2 years ago

jmarshallnz commented 2 years ago

For LDA using the MASS and sparsediscrim engines, the base modelling functions have a prior argument.

It would be useful if this could be specified in discrim/tidymodels.

This PR just adds prior = NULL as engine defaults for these two engines.

I am not sure if this is the most sensible way to do this. Prior probabilities aren't really tunable (well, shouldn't be?) so I don't think they fit as an argument? But it is a little awkward to use set_engine("MASS", prior = c(1,1,1)/3) prior to passing through to fit() when we haven't yet defined the outcome variable.

Am happy to adjust accordingly. Thanks for the great package(s)!

Simple way to test:

discrim_linear() %>%
  fit(Species ~ ., data=iris) %>%
  purrr::pluck("fit", "prior")

discrim_linear() %>%
  set_engine("MASS", prior=c(2,1,2)/5) %>%
  fit(Species ~ ., data=iris) %>%
  purrr::pluck("fit", "prior")

Side note: lda() uses is.missing(prior) to determine whether the prior was set. This seems to work when I specify prior=NULL in defaults, but it was unclear to me that it would work - happy to take pointers!

juliasilge commented 2 years ago

I may be misunderstanding, but can you clarify what you are wanting to be able to do that you can't do right now? You can set these as engine arguments right now with the current CRAN version of discrim (another example here):

library(discrim)
#> Loading required package: parsnip

discrim_linear() %>%
    fit(Species ~ ., data=iris) %>%
    purrr::pluck("fit", "prior")
#>     setosa versicolor  virginica 
#>  0.3333333  0.3333333  0.3333333

discrim_linear() %>%
    set_engine("MASS", prior=c(2,1,2)/5) %>%
    fit(Species ~ ., data=iris) %>%
    purrr::pluck("fit", "prior")
#>     setosa versicolor  virginica 
#>        0.4        0.2        0.4

Created on 2022-03-21 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 (2021-08-10) #> os macOS Monterey 12.2.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Denver #> date 2022-03-21 #> pandoc 2.17.1.1 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> cli 3.2.0 2022-02-14 [1] CRAN (R 4.1.1) #> codetools 0.2-18 2020-11-04 [1] CRAN (R 4.1.1) #> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.1.1) #> crayon 1.5.0 2022-02-14 [1] CRAN (R 4.1.1) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.1) #> dials 0.1.0 2022-01-31 [1] CRAN (R 4.1.1) #> DiceDesign 1.9 2021-02-13 [1] CRAN (R 4.1.0) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.1) #> discrim * 0.2.0 2022-03-09 [1] CRAN (R 4.1.1) #> dplyr 1.0.8 2022-02-08 [1] CRAN (R 4.1.1) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.1.1) #> fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.1) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.1) #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.1) #> ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.1.0) #> globals 0.14.0 2020-11-22 [1] CRAN (R 4.1.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.1) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0) #> hardhat 0.2.0 2022-01-24 [1] CRAN (R 4.1.1) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1) #> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.1) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1) #> magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.1) #> MASS 7.3-55 2022-01-13 [1] CRAN (R 4.1.1) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) #> parsnip * 0.2.1 2022-03-17 [1] CRAN (R 4.1.1) #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0) #> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.1.1) #> rmarkdown 2.13 2022-03-10 [1] CRAN (R 4.1.1) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) #> sessioninfo 1.2.2.9000 2022-03-04 [1] Github (r-lib/sessioninfo@f971f10) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> styler 1.7.0 2022-03-13 [1] CRAN (R 4.1.1) #> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.1) #> tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.1.1) #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.1.1) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.1) #> xfun 0.30 2022-03-02 [1] CRAN (R 4.1.1) #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.1) #> #> [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Can you be a bit more specific in what you can't do right now that you want to be able to do?

jmarshallnz commented 2 years ago

Aha! This is my complete misunderstanding. Will close.

I assumed that arguments in set_engine() that weren't listed in defaults were not available to the user. This is clearly not the case, so I'll try and explain why I thought that in case it might be useful!

I think it was because:

  1. I first tried putting more arguments into fit() instead of set_engine() which obviously doesn't work.
  2. The examples I stumbled across, e.g. https://www.tmwr.org/models.html#create-a-model are adjusting arguments in default, i.e. those listed via translate().
  3. I misread this bit: "defaults is an optional list of arguments to the fit function that the user can change, but whose defaults can be set here" as meaning "only arguments listed here can be changed" rather than "only use this if you want to change the default of the fit function". The information further down in the FAQ also doesn't explicitly state that users can pass other arguments to set_engine() than the defaults you supply (it's possibly implicit?)

These three things were enough for me to reach the conclusion that you couldn't add other arguments. Enough that I didn't even try it! (This is clearly a bit silly, so maybe nothing needs changing?)

I do notice that the help for set_engine() is adjusting an argument that isn't in defaults. I missed that completely! But I wonder if it could be clarified further, along the lines of "The ... argument of set_engine() allows any engine-specific argument to be passed directly to the engine fitting function, e.g. ranger has an importance argument, which could be specified in set_engine.

I would be happy to do up a PR for the parsnip::set_engine() docs if you feel it would be useful.

juliasilge commented 2 years ago

Yeah, if you'd like to open a PR on parsnip to change this:

Engine arguments are either specific to a particular engine or used more rarely; there is no change for these argument names from the underlying engine. Set these in set_engine(), like set_engine("ranger", importance = "permutation").

to this:

Engine arguments are either specific to a particular engine or used more rarely; there is no change for these argument names from the underlying engine. The ... argument of set_engine() allows any engine-specific argument to be passed directly to the engine fitting function, like set_engine("ranger", importance = "permutation").

That would be great! Here is info on contributing documentation to parsnip.

jmarshallnz commented 2 years ago

https://github.com/tidymodels/parsnip/pull/687 :)

github-actions[bot] commented 2 years ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.