Closed htlin closed 2 years ago
Can you dummy up a small data set and provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!
If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down that page.
Yes sorry about that, here is the reprex:
library(rsample)
#> Loading required package: tidyr
run_experiment <- function(all_dataset) {
outer_cv <- 5
inner_cv <- 4
sampling1 <- nested_cv(all_dataset,
outside = group_vfold_cv(v = outer_cv, group = "Rep"),
inside = group_vfold_cv(v = inner_cv, group = "Rep"))
sampling2 <- nested_cv(all_dataset,
outside = group_vfold_cv(v = outer_cv, group = "Rep2"),
inside = group_vfold_cv(v = inner_cv, group = "Rep2"))
}
all_dataset <- matrix(nrow = 50, ncol = 5, 0) %>% as.data.frame
all_dataset$Rep <- 1:5
all_dataset$Rep2 <- 5:1
run_experiment(all_dataset)
#> Error in group_vfold_splits(data = data, group = group, v = v): object 'outer_cv' not found
Created on 2019-02-01 by the reprex package (v0.2.1)
In theory you should be able to do this with no problem, so I'd call it a bug. I think the environment could be captured (maybe with parent.frame()
?) and then the eval()
call could specify that as the environment.
Alternatively, it would probably be beneficial (and not too bad) to rewrite using quosures so we won't have to worry about the environments at all. The only weird thing would be inserting the data into the call.
In the meantime, if you want to program around it, you can do:
library(rsample)
#> Loading required package: tidyr
#>
#> Attaching package: 'rsample'
#> The following object is masked from 'package:tidyr':
#>
#> fill
run_experiment <- function(all_dataset) {
outer_cv <- 5
inner_cv <- 4
sampling1_call <- rlang::expr(
nested_cv(
all_dataset,
outside = group_vfold_cv(v = !!outer_cv, group = "Rep"),
inside = group_vfold_cv(v = !!inner_cv, group = "Rep")
)
)
sampling2_call <- rlang::expr(
nested_cv(
all_dataset,
outside = group_vfold_cv(v = !!outer_cv, group = "Rep2"),
inside = group_vfold_cv(v = !!inner_cv, group = "Rep2")
)
)
sampling1 <- rlang::eval_tidy(sampling1_call)
sampling2 <- rlang::eval_tidy(sampling2_call)
list(sampling1, sampling2)
}
all_dataset <- matrix(nrow = 50, ncol = 5, 0) %>% as.data.frame
all_dataset$Rep <- 1:5
all_dataset$Rep2 <- 5:1
run_experiment(all_dataset)
#> [[1]]
#> [1] "nested_cv" "group_vfold_cv" "rset" "tbl_df"
#> [5] "tbl" "data.frame"
#> # Nested resampling:
#> # outer: Group 5-fold cross-validation
#> # inner: Group 4-fold cross-validation
#> # A tibble: 5 x 3
#> splits id inner_resamples
#> <list> <chr> <list>
#> 1 <split [40/10]> Resample1 <tibble [4 × 2]>
#> 2 <split [40/10]> Resample2 <tibble [4 × 2]>
#> 3 <split [40/10]> Resample3 <tibble [4 × 2]>
#> 4 <split [40/10]> Resample4 <tibble [4 × 2]>
#> 5 <split [40/10]> Resample5 <tibble [4 × 2]>
#>
#> [[2]]
#> [1] "nested_cv" "group_vfold_cv" "rset" "tbl_df"
#> [5] "tbl" "data.frame"
#> # Nested resampling:
#> # outer: Group 5-fold cross-validation
#> # inner: Group 4-fold cross-validation
#> # A tibble: 5 x 3
#> splits id inner_resamples
#> <list> <chr> <list>
#> 1 <split [40/10]> Resample1 <tibble [4 × 2]>
#> 2 <split [40/10]> Resample2 <tibble [4 × 2]>
#> 3 <split [40/10]> Resample3 <tibble [4 × 2]>
#> 4 <split [40/10]> Resample4 <tibble [4 × 2]>
#> 5 <split [40/10]> Resample5 <tibble [4 × 2]>
Created on 2019-02-01 by the reprex package (v0.2.1.9000)
What's the tidyeval equivalent of match.call()
? I tried using quo
and eval_tidy
but not sure how to find the environment's parents.
The actual value of outer_cv
and inner_cv
don't get picked up by match.call()
.
https://github.com/tidymodels/rsample/blob/775ac5559a477b39f1c23ef1380a7abb036d73fe/R/nest.R#L56-L58
So you can't evaluate the outside
https://github.com/tidymodels/rsample/blob/775ac5559a477b39f1c23ef1380a7abb036d73fe/R/nest.R#L72
and inside
https://github.com/tidymodels/rsample/blob/775ac5559a477b39f1c23ef1380a7abb036d73fe/R/nest.R#L96-L99
@fbchow it will probably use enquo()
and eval_tidy()
as you are saying. When you evaluate the quosure using eval_tidy()
, it will evaluate the quosure in the environment that it was specified in (which I think is the parent that you are referring to).
The weirdness for this example is that we are going to have to modify the expression of the quosure using something like rlang::call_modify()
before evaluating it. It will likely look something like this:
library(rlang)
library(rsample)
#> Warning: package 'rsample' was built under R version 3.5.2
#> Loading required package: tidyr
dat <- data.frame(x = c(1, 2))
outside <- rlang::quo(bootstraps(times = 5))
outside
#> <quosure>
#> expr: ^bootstraps(times = 5)
#> env: global
outside_modified <- rlang::call_modify(outside, data = dat)
outside_modified
#> <quosure>
#> expr: ^bootstraps(times = 5, data = <data.frame>)
#> env: global
eval_tidy(outside_modified)
#> # Bootstrap sampling
#> # A tibble: 5 x 2
#> splits id
#> <list> <chr>
#> 1 <split [2/1]> Bootstrap1
#> 2 <split [2/1]> Bootstrap2
#> 3 <split [2/0]> Bootstrap3
#> 4 <split [2/1]> Bootstrap4
#> 5 <split [2/1]> Bootstrap5
Created on 2019-02-11 by the reprex package (v0.2.1.9000)
You can also use data = expr(dat)
rather than data = dat
which will embed the name dat
into the call rather than the entire data frame there. It shouldn't make a big difference for this example though.
library(rlang)
library(rsample)
#> Warning: package 'rsample' was built under R version 3.5.2
#> Loading required package: tidyr
dat <- data.frame(x = c(1, 2))
outside <- rlang::quo(bootstraps(times = 5))
outside
#> <quosure>
#> expr: ^bootstraps(times = 5)
#> env: global
outside_modified <- rlang::call_modify(outside, data = rlang::expr(dat))
outside_modified
#> <quosure>
#> expr: ^bootstraps(times = 5, data = dat)
#> env: global
eval_tidy(outside_modified)
#> # Bootstrap sampling
#> # A tibble: 5 x 2
#> splits id
#> <list> <chr>
#> 1 <split [2/0]> Bootstrap1
#> 2 <split [2/0]> Bootstrap2
#> 3 <split [2/0]> Bootstrap3
#> 4 <split [2/0]> Bootstrap4
#> 5 <split [2/0]> Bootstrap5
Created on 2019-02-11 by the reprex package (v0.2.1.9000)
At long last, this is now fixed:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(rsample)
run_experiment <- function(all_dataset) {
outer_cv <- 5
inner_cv <- 4
sampling1 <- nested_cv(all_dataset,
outside = group_vfold_cv(v = outer_cv, group = "Rep"),
inside = group_vfold_cv(v = inner_cv, group = "Rep"))
sampling2 <- nested_cv(all_dataset,
outside = group_vfold_cv(v = outer_cv, group = "Rep2"),
inside = group_vfold_cv(v = inner_cv, group = "Rep2"))
list(sampling1, sampling2)
}
all_dataset <- matrix(nrow = 50, ncol = 5, 0) %>% as.data.frame()
all_dataset$Rep <- 1:5
all_dataset$Rep2 <- 5:1
run_experiment(tibble(all_dataset))
#> [[1]]
#> # Nested resampling:
#> # outer: Group 5-fold cross-validation
#> # inner: Group 4-fold cross-validation
#> # A tibble: 5 × 3
#> splits id inner_resamples
#> <list> <chr> <list>
#> 1 <split [40/10]> Resample1 <group_vfold_cv [4 × 2]>
#> 2 <split [40/10]> Resample2 <group_vfold_cv [4 × 2]>
#> 3 <split [40/10]> Resample3 <group_vfold_cv [4 × 2]>
#> 4 <split [40/10]> Resample4 <group_vfold_cv [4 × 2]>
#> 5 <split [40/10]> Resample5 <group_vfold_cv [4 × 2]>
#>
#> [[2]]
#> # Nested resampling:
#> # outer: Group 5-fold cross-validation
#> # inner: Group 4-fold cross-validation
#> # A tibble: 5 × 3
#> splits id inner_resamples
#> <list> <chr> <list>
#> 1 <split [40/10]> Resample1 <group_vfold_cv [4 × 2]>
#> 2 <split [40/10]> Resample2 <group_vfold_cv [4 × 2]>
#> 3 <split [40/10]> Resample3 <group_vfold_cv [4 × 2]>
#> 4 <split [40/10]> Resample4 <group_vfold_cv [4 × 2]>
#> 5 <split [40/10]> Resample5 <group_vfold_cv [4 × 2]>
Created on 2021-11-18 by the reprex package (v2.0.1)
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Hi, I like to make a few
nested_cv
's based on the same partition configuration as follows:However, I am getting
error, which is out of the scope for the
group_vfold_cv
function. Do you have any recommendations? Does tidy evaluation help in this case?