tidymodels / parsnip

A tidy unified interface to models
https://parsnip.tidymodels.org
Other
595 stars 88 forks source link

Sampling from estimated conditional distributions using different predict() methods #375

Closed awunderground closed 6 months ago

awunderground commented 4 years ago

Feature

I have a specific question and a general feature request/question.

I sometimes sample from nodes in regression trees instead of using node means. For example, I can use library(partykit) to sample from the nodes:

library(rpart)
library(partykit)
library(tidyverse)

rpart_model <- rpart(mpg ~ ., data = mtcars)

node_ecdf <- predict(as.party(rpart_model), newdata = remove_rownames(mtcars[1, ]), type = "prob")

sample(environment(node_ecdf[["1"]])[["x"]], size = 1)

The process above is clunky and does not generalize across models or packages. It also doesn't work with library(parsnip):

library(tidymodels)

cart_model <- parsnip::decision_tree() %>%
  parsnip::set_engine("rpart") %>%
  parsnip::set_mode("regression")

parsnip_model <- fit(cart_model, mpg ~ ., data = mtcars)

as.party(parsnip_model)
Error in UseMethod("as.party") : 
  no applicable method for 'as.party' applied to an object of class "c('_rpart', 'model_fit')"
  1. Specific question: should class conversions like as.party() work in this situation?
  2. General request/question: are there plans for tidymodels to allow for a wider range of prediction methods or will this all be handled through the model packages (e.g. rpart)? It is useful to sample from conditional distributions created by lm, rpart, ranger, etc. I am happy to work on this and want to make sure my efforts align with your excellent API/framework.
juliasilge commented 4 years ago

On your first question, you need to use repair_call() (see more here).

library(tidymodels)
library(partykit)
#> Loading required package: grid
#> Loading required package: libcoin
#> Loading required package: mvtnorm

cart_model <- parsnip::decision_tree() %>%
  parsnip::set_engine("rpart") %>%
  parsnip::set_mode("regression")

cart_fit <- fit(cart_model, mpg ~ ., data = mtcars)
fixed_fit <- repair_call(cart_fit, data = mtcars)
as.party(fixed_fit$fit)
#> 
#> Model formula:
#> mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
#> 
#> Fitted party:
#> [1] root
#> |   [2] cyl >= 5
#> |   |   [3] hp >= 192.5: 13.414 (n = 7, err = 28.8)
#> |   |   [4] hp < 192.5: 18.264 (n = 14, err = 59.9)
#> |   [5] cyl < 5: 26.664 (n = 11, err = 203.4)
#> 
#> Number of inner nodes:    2
#> Number of terminal nodes: 3

Created on 2020-10-05 by the reprex package (v0.3.0.9001)

topepo commented 3 years ago

are there plans for tidymodels to allow for a wider range of prediction methods or will this all be handled through the model packages (e.g. rpart)? It is useful to sample from conditional distributions created by lm, rpart, ranger, etc. I am happy to work on this and want to make sure my efforts align with your excellent API/framework.

I don't want to maintain parsnip wrappers for a large number of modeling functions. That has been a bit of a nightmare for caret.

The nice thing about parsnip is that the work and be spread to "parsnip-adjacent packages (e.g. rules, baguette, etc).

I started on party engines to use in the treesnip package but have not gotten far (mostly due to how their S4 methods work). That's on my holiday "pet project" list.

In general though, if there is something that you want to implement and maintain, take a look at the help documentation and add issues here in case you run into issues.

simonpcouch commented 6 months ago

Going to go ahead and close as this hasn't come to the top of our to-do in the last 4 years.

Generally, though, while you can't as.party(parsnip_model) in this case, you can as.party(extract_fit_engine(parsnip_model)). If you run into issues doing so, please feel free to open a new issue. :)

github-actions[bot] commented 6 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.