Open ndiquattro opened 4 years ago
I looked at the documentation and agree that it needs to be revised.
I think that the intention was to do some dplyr
work to get the predictions in the format that you might want.
Here's some code that uses dplyr
, purrr
, and tidyr
:
library(ranger)
library(tidypredict)
library(dplyr, warn.conflicts = FALSE)
test_mod <- ranger(Species ~ ., iris, num.trees = 100)
trees <- tidypredict_fit(test_mod)
new_samples <- iris[c(1, 51, 101), ]
votes <-
purrr:::map_dfr(trees,
~ tibble(.pred = rlang::eval_tidy(.x, new_samples),
.row = 1:nrow(new_samples)
)
)
class_pred <-
votes %>%
group_by(.row) %>%
count(.pred) %>%
slice_max(n) %>%
ungroup() %>%
select(-n)
class_pred
#> # A tibble: 3 x 2
#> .row .pred
#> <int> <chr>
#> 1 1 setosa
#> 2 2 versicolor
#> 3 3 virginica
class_prob <-
votes %>%
group_by(.row) %>%
count(.pred) %>%
mutate(prob = n/100) %>%
ungroup() %>%
select(-n) %>%
tidyr::pivot_wider(id_cols = ".row", names_from = ".pred", values_from = "prob", values_fill = 0)
class_prob
#> # A tibble: 3 x 4
#> .row setosa versicolor virginica
#> <int> <dbl> <dbl> <dbl>
#> 1 1 1 0 0
#> 2 2 0 0.98 0.02
#> 3 3 0 0 1
Created on 2020-12-04 by the reprex package (v0.3.0)
Hello, thanks for your work on this package, it is very exciting! I was trying to to follow the docs on using a
ranger
RF model, but it seems to return a list of trees/case_when
s rather than one statement. Is it intended we execute all the trees on the DB then calculate the prediction from the results? I don't get that impression from the docs. Thanks!Created on 2020-08-23 by the reprex package (v0.3.0)