Closed gcameron89777 closed 4 years ago
Here is perhaps a simpler reprex:
library(rsample)
#> Loading required package: tidyr
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
mtcars %>%
vfold_cv(10) %>%
crossing(x = c(1, 2, 3))
#> `x` must be a vector, not a `rsplit/vfold_split` object
mtcars %>%
vfold_cv(10) %>%
nest(data = c(id)) %>%
unnest(cols = c(data)) %>%
crossing(x = c(1, 2, 3))
#> `x` must be a vector, not a `rsplit/vfold_split` object
mtcars %>%
vfold_cv(10) %>%
group_by(id) %>%
nest() %>%
unnest(cols = c(data)) %>%
crossing(x = c(1, 2, 3))
#> # A tibble: 30 x 3
#> id splits x
#> <chr> <list> <dbl>
#> 1 Fold01 <split [28/4]> 1
#> 2 Fold01 <split [28/4]> 2
#> 3 Fold01 <split [28/4]> 3
#> 4 Fold02 <split [28/4]> 1
#> 5 Fold02 <split [28/4]> 2
#> 6 Fold02 <split [28/4]> 3
#> 7 Fold03 <split [29/3]> 1
#> 8 Fold03 <split [29/3]> 2
#> 9 Fold03 <split [29/3]> 3
#> 10 Fold04 <split [29/3]> 1
#> # ... with 20 more rows
Created on 2020-02-13 by the reprex package (v0.3.0)
For some reason grouping, nesting, and unnesting works.
I'm guessing that it was an issue with the version of tidyr
that you were using. It works for me (see below).
Two things though:
train
and validate
columns. The memory requirements for the splits
column are not affected since they store exactly the same data but your new columns do not. tune
package (which was new when you posted this). library(tidyverse)
library(rsample)
example_data <- read_csv("~/Downloads/example_data.csv")
#> Parsed with column specification:
#> cols(
#> a = col_double(),
#> b = col_double(),
#> c = col_double(),
#> d = col_double(),
#> e = col_double(),
#> f = col_double(),
#> g = col_double(),
#> h = col_double(),
#> i = col_double(),
#> j = col_logical()
#> )
example_split <- initial_split(example_data, 0.9)
training_data <- training(example_split)
testing_data <- testing(example_split)
# 5 fold split stratified on j
set.seed(123)
train_cv <- vfold_cv(training_data, 5, strata = j)
# Before unpacking:
lobstr::obj_size(train_cv)
#> 8,289,640 B
train_cv <-
train_cv %>%
# create training and validation sets within each fold
mutate(train = map(splits, ~training(.x)),
validate = map(splits, ~testing(.x)))
# After unpacking:
lobstr::obj_size(train_cv)
#> 42,499,296 B
blah <- train_cv %>%
crossing(mtry = c(1,2))
blah
#> # A tibble: 10 x 5
#> splits id train validate mtry
#> <named list> <chr> <named list> <named list> <dbl>
#> 1 <split [72K/18K]> Fold1 <tibble [72,000 × 10]> <tibble [18,001 × 10]> 1
#> 2 <split [72K/18K]> Fold1 <tibble [72,000 × 10]> <tibble [18,001 × 10]> 2
#> 3 <split [72K/18K]> Fold2 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 1
#> 4 <split [72K/18K]> Fold2 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 2
#> 5 <split [72K/18K]> Fold3 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 1
#> 6 <split [72K/18K]> Fold3 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 2
#> 7 <split [72K/18K]> Fold4 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 1
#> 8 <split [72K/18K]> Fold4 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 2
#> 9 <split [72K/18K]> Fold5 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 1
#> 10 <split [72K/18K]> Fold5 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 2
lobstr::obj_size(blah)
#> 42,498,352 B
Created on 2020-03-29 by the reprex package (v0.3.0)
Thank you Max!
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.
Error:
x
must be a vector, not arsplit/vfold_split
objectI am experiencing the above error when using tidyr::crossing() just after creativng a rsplit object using
vfold_cv()
. The error is intermittent, it happens sometimes. Others have been able to reproduce, sometimes.Example csv file to reproduce.
train_cv looks like this:
I would like to use the same train_cv object in my script for trying different models with their own tuning parameters. In the example above, if
crossing(mtry = c(1, 2))
works, the desired output would taketrain_cv
and make it look like this:Session Info:
Not sure if this is an actual issue or a problem with my code. I tried the rstudio community forum first.