Closed konradsemsch closed 5 years ago
Honestly, I have not idea. I rewrote the prediction helper function to be a little more simple and rearranged the arguments (odc :-/). I also added a performance metric below too.
We're working on model tuning right now that will make this a lot easier. The use of crossing()
is fine but you probably won't have to do that once we have the better api in place.
set.seed(42)
# Loading libraries -------------------------------------------------------
library(magrittr)
library(tidyverse)
#> Registered S3 method overwritten by 'rvest':
#> method from
#> read_xml.response xml2
library(tidymodels)
#> ── Attaching packages ──────────────────────────────────────────────────────── tidymodels 0.0.2 ──
#> ✔ broom 0.5.2 ✔ recipes 0.1.6
#> ✔ dials 0.0.2 ✔ rsample 0.0.5
#> ✔ infer 0.4.0.1 ✔ yardstick 0.0.3
#> ✔ parsnip 0.0.3
#> ── Conflicts ─────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ scales::discard() masks purrr::discard()
#> ✖ tidyr::extract() masks magrittr::extract()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ recipes::fixed() masks stringr::fixed()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ purrr::set_names() masks magrittr::set_names()
#> ✖ yardstick::spec() masks readr::spec()
#> ✖ recipes::step() masks stats::step()
library(dials)
library(furrr)
#> Loading required package: future
# Loading input dataset ---------------------------------------------------
df_all <- iris %>%
filter(Species != "setosa") %>%
mutate(Species = factor(Species, levels = c("versicolor", "virginica")))
# Dividing the dataset ----------------------------------------------------
df_train_cv <- vfold_cv(df_all, v = 5, repeats = 1)
# Preparing the recipes ----------------------------------------------------
# I need to add a custom step over here on the missing patterns
en_rec <- df_all %>%
recipe(Species ~ .) %>%
step_pca(all_predictors(), num_comp = 2)
# Training models withing resamples ---------------------------------------
fit_on_fold <- function(spec, prepped) {
x <- juice(prepped, all_predictors())
y <- juice(prepped, all_outcomes())
fit_xy(spec, x, y)
}
en_engine <- logistic_reg(mode = "classification") %>%
set_engine("glmnet")
en_grid <- grid_regular(penalty, mixture, levels = c(2, 2))
en_spec <- tibble(spec = merge(en_engine, en_grid)) %>% # combining model engine with different parameters
mutate(model_id = row_number())
en_spec_cv <- crossing(df_train_cv, en_spec) # adding cross-validated folds
en_fits_cv <- en_spec_cv %>% # fitting different model specifications to different folds
mutate(
prepped = future_map(splits, prepper, en_rec),
fit = future_map2(spec, prepped, fit_on_fold)
)
predict_helper <- function(split, recipe, fit) {
new_x <- bake(recipe, new_data = assessment(split), all_predictors())
predict(fit, new_x, type = "prob") %>%
bind_cols(assessment(split) %>% select(Species))
}
en_fits_cv_pred <- en_fits_cv %>%
mutate(
preds = future_pmap(list(splits, prepped, fit), predict_helper)
)
indiv_estimates <-
en_fits_cv_pred %>%
unnest(preds) %>%
group_by(id, model_id) %>%
# or some other performance measure:
mn_log_loss(truth = Species, .pred_virginica)
rs_estimates <-
indiv_estimates %>%
group_by(model_id, .metric, .estimator) %>%
summarize(mean = mean(.estimate, na.rm = TRUE))
rs_estimates
#> # A tibble: 4 x 4
#> # Groups: model_id, .metric [4]
#> model_id .metric .estimator mean
#> <int> <chr> <chr> <dbl>
#> 1 1 mn_log_loss binary 2.36
#> 2 2 mn_log_loss binary 0.938
#> 3 3 mn_log_loss binary 9.45
#> 4 4 mn_log_loss binary 0.691
Created on 2019-07-31 by the reprex package (v0.2.1)
Thanks @topepo for taking a look!
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
I took the example listed in this blogpost and tried to replicate it using
glmnet
: https://www.alexpghayes.com/blog/implementing-the-super-learner-with-tidymodels/I wanted to use binary classification so I excluded one of the factor levels, but otherwise changed as minimal as possible in order to run it. When I'm getting to the part when I want to make predictions on the split's assessment set I get the following error:
More specifically it's breaking in this part when I'm trying to make predictions on the hold-out set:
I was also trying to run the prediction using only 1 model fit to exclude the possibility of something breaking in the map, but the error perists:
The full code I'm running is the following:
I've been looking for help around the internet but unfortunately I'm absolutely about where the root case could be. Could anyone assist?
My session info below: