nathaneastwood / poorman

A poor man's dependency free grammar of data manipulation
https://nathaneastwood.github.io/poorman/
Other
338 stars 15 forks source link

Mutate Fails When Evaluating a Function Defined in the Scope of a Parent Function #68

Closed alex-gable closed 3 years ago

alex-gable commented 3 years ago

Describe the bug When using poorman::mutate inside a function A and using a function B which is defined in the scope of another function (A) on the right-hand side of the mutate call, the internally defined function B is not found.

To Reproduce

`%>%` <- poorman::`%>%`

pass_through <- function(x) x

# no issue
mtcars %>% 
  poorman::group_by(cyl) %>% 
  poorman::mutate(carb = pass_through(carb)) %>% 
  head()
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# pass_through_internal not found
test_function <- function(mtcars_df) {
  pass_through_internal <- function(x) x

  mtcars_df %>% 
    poorman::mutate(carb = pass_through(carb),
                    gear = pass_through_internal(gear)) %>% 
    head()
}

# returns error
test_function(mtcars)
#> Error in pass_through_internal(gear): could not find function "pass_through_internal"

# pass_through_internal_group not found
test_function_grouped <- function(mtcars_df) {
  pass_through_internal_group <- function(x) x

  mtcars_df %>% 
    poorman::group_by(cyl) %>% 
    poorman::mutate(carb = pass_through(carb),
                    gear = pass_through_internal_group(gear)) %>% 
    head()
}

# returns error
test_function_grouped(mtcars)
#> Error in pass_through_internal_group(gear): could not find function "pass_through_internal_group"

Created on 2021-01-03 by the reprex package (v0.3.0)

Expected behavior The internally-defined function evaluates normally.

System Information: Please detail the following information

R.version.string
#> [1] "R version 4.0.3 (2020-10-10)"
packageVersion("poorman")
#> [1] '0.2.4'

Created on 2021-01-03 by the reprex package (v0.3.0)

nathaneastwood commented 3 years ago

This is a good catch, thanks. I forgot to set the correct enclosing env where the functions should be evaluated. I've issued a fix in the development branch and added a test using your reprex above. I see you are using poorman in a PR, would you be so kind as to test this fix out with your real use case and let me know how it goes?

alex-gable commented 3 years ago

definitely! I'll follow up in the next day or so. thanks for the quick turnaround @nathaneastwood!

alex-gable commented 3 years ago

Tests pass when using 0.2.4.4

nathaneastwood commented 3 years ago

Brilliant. Thanks for testing it 🙂 (sudden realisation that I should have bumped to 0.2.4.5)

alex-gable commented 3 years ago

~I believe this may also apply to arrange, but need to go track down my example in order to provide a reprex~

disregard, I'll log a separate bug (#69)

alex-gable commented 3 years ago

Hmm. appears I was mistaken, it's still failing in the grouped case.

I used a debugger to inspect the difference in the parent.frame() in the grouped and ungrouped cases

Grouped ```r Browse[10]> conditions[[i]] pass_through_internal_group(gear) Browse[10]> context$as_env() mpg cyl disp hp drat wt qsec vs am gear carb Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Browse[10]> mget(ls(parent.frame()), envir = parent.frame()) $FUN function (.data, ...) { UseMethod("mutate") } $i [1] 1 $X $X$`4` mpg cyl disp hp drat wt qsec vs am gear carb Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 $X$`6` mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 $X$`8` mpg cyl disp hp drat wt qsec vs am gear carb Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 ```
Ungrouped ```r > test_function(mtcars) Called from: test_function(mtcars) Browse[1]> n debug at #5: mutate(mtcars_df, carb = pass_through(carb), gear = pass_through_internal(gear)) Browse[2]> s debugging in: mutate(mtcars_df, carb = pass_through(carb), gear = pass_through_internal(gear)) debug: { UseMethod("mutate") } Browse[3]> n debug: UseMethod("mutate") Browse[3]> n debugging in: mutate.data.frame(mtcars_df, carb = pass_through(carb), gear = pass_through_internal(gear)) debug: { keep <- match.arg(arg = .keep, choices = c("all", "used", "unused", "none"), several.ok = FALSE) conditions <- dotdotdot(..., .impute_names = TRUE) cond_nms <- names(dotdotdot(..., .impute_names = FALSE)) used <- find_used(.data, conditions) if (length(conditions) == 0L) return(.data) context$setup(.data) on.exit(context$clean(), add = TRUE) for (i in seq_along(conditions)) { not_named <- (is.null(cond_nms) || cond_nms[i] == "") res <- eval(conditions[[i]], envir = context$as_env(), enclos = parent.frame()) res_nms <- names(res) if (is.data.frame(res)) { if (not_named) { context$.data[, res_nms] <- res } else { context$.data[[cond_nms[i]]] <- res } } else if (is.atomic(res)) { context$.data[[names(conditions)[[i]]]] <- res } else { if (is.null(res_nms)) names(res) <- names(conditions)[[i]] context$.data[[names(res)]] <- res } } .before <- substitute(.before) .after <- substitute(.after) if (!is.null(.before) || !is.null(.after)) { new <- setdiff(cond_nms, names(.data)) context$.data <- do.call(relocate, c(list(.data = context$.data), new, .before = .before, .after = .after)) } if (keep == "all") { context$.data } else if (keep == "unused") { unused <- setdiff(colnames(.data), used) keep <- intersect(context$get_colnames(), c(group_vars(.data), unused, cond_nms)) do.call(select, c(list(.data = context$.data), keep)) } else if (keep == "used") { keep <- intersect(context$get_colnames(), c(group_vars(.data), used, cond_nms)) do.call(select, c(list(.data = context$.data), keep)) } else if (keep == "none") { keep <- c(setdiff(group_vars(.data), cond_nms), intersect(cond_nms, context$get_colnames())) do.call(select, c(list(.data = context$.data), keep)) } } Browse[4]> n debug: keep <- match.arg(arg = .keep, choices = c("all", "used", "unused", "none"), several.ok = FALSE) Browse[4]> n debug: conditions <- dotdotdot(..., .impute_names = TRUE) Browse[4]> n debug: cond_nms <- names(dotdotdot(..., .impute_names = FALSE)) Browse[4]> n debug: used <- find_used(.data, conditions) Browse[4]> n debug: if (length(conditions) == 0L) return(.data) Browse[4]> n debug: context$setup(.data) Browse[4]> n debug: on.exit(context$clean(), add = TRUE) Browse[4]> n debug: for (i in seq_along(conditions)) { not_named <- (is.null(cond_nms) || cond_nms[i] == "") res <- eval(conditions[[i]], envir = context$as_env(), enclos = parent.frame()) res_nms <- names(res) if (is.data.frame(res)) { if (not_named) { context$.data[, res_nms] <- res } else { context$.data[[cond_nms[i]]] <- res } } else if (is.atomic(res)) { context$.data[[names(conditions)[[i]]]] <- res } else { if (is.null(res_nms)) names(res) <- names(conditions)[[i]] context$.data[[names(res)]] <- res } } Browse[4]> n debug: not_named <- (is.null(cond_nms) || cond_nms[i] == "") Browse[4]> n debug: res <- eval(conditions[[i]], envir = context$as_env(), enclos = parent.frame()) Browse[4]> mget(ls(parent.frame()), envir = parent.frame()) $mtcars_df mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 $pass_through_internal function(x) x ```
alex-gable commented 3 years ago

I'm a bit out of my depth here, but my uneducated hunch says that mutate.grouped_data call might need to pass through its calling environment somehow. By the time we hit apply_grouped_function, we've lost pass_through_internal_group

> test_function_grouped(mtcars)
Called from: test_function_grouped(mtcars)
Browse[1]> n
debug at #7: mutate(gb_mtcars, carb = pass_through(carb), gear = pass_through_internal_group(gear))
Browse[2]> s
debugging in: mutate(gb_mtcars, carb = pass_through(carb), gear = pass_through_internal_group(gear))
debug: {
    UseMethod("mutate")
}
Browse[3]> n
debug: UseMethod("mutate")
Browse[3]> n
debugging in: mutate.grouped_data(gb_mtcars, carb = pass_through(carb), gear = pass_through_internal_group(gear))
debug: {
    rows <- rownames(.data)
    res <- apply_grouped_function("mutate", .data, drop = TRUE, 
        ...)
    res[rows, , drop = FALSE]
}
Browse[4]> n
debug: rows <- rownames(.data)
Browse[4]> n
debug: res <- apply_grouped_function("mutate", .data, drop = TRUE, ...)
Browse[4]> s
debugging in: apply_grouped_function("mutate", .data, drop = TRUE, ...)
debug: {
    groups <- group_vars(.data)
    grouped <- split_into_groups(.data, groups, drop)
    res <- do.call(rbind, unname(lapply(grouped, fn, ...)))
    if (any(groups %in% colnames(res))) {
        class(res) <- c("grouped_data", class(res))
        res <- groups_set(res, groups[groups %in% colnames(res)])
    }
    res
}
Browse[5]> n
debug: groups <- group_vars(.data)
Browse[5]> n
debug: grouped <- split_into_groups(.data, groups, drop)
Browse[5]> n
debug: res <- do.call(rbind, unname(lapply(grouped, fn, ...)))
Browse[5]> s
debugging in: do.call(rbind, unname(lapply(grouped, fn, ...)))
debug: {
    if (!is.list(args)) 
        stop("second argument must be a list")
    if (quote) 
        args <- lapply(args, enquote)
    .Internal(do.call(what, args, envir))
}
debugging in: unname(lapply(grouped, fn, ...))
debug: {
    if (!is.null(names(obj))) 
        names(obj) <- NULL
    if (!is.null(dimnames(obj)) && (force || !is.data.frame(obj))) 
        dimnames(obj) <- NULL
    obj
}
Browse[2]> n
debug: if (!is.null(names(obj))) names(obj) <- NULL
Browse[2]> n
Browse[6]> mget(ls(parent.frame()), envir = parent.frame())
$drop
[1] TRUE

$fn
[1] "mutate"

$grouped
$grouped$`4`
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

$grouped$`6`
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

$grouped$`8`
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

$groups
[1] "cyl"

Warning messages:
1: In get(object, envir = currentEnv, inherits = TRUE) :
  restarting interrupted promise evaluation
2: In get(object, envir = currentEnv, inherits = TRUE) :
  restarting interrupted promise evaluation
Browse[6]> args
Error in pass_through_internal_group(gear) : 
  could not find function "pass_through_internal_group"
nathaneastwood commented 3 years ago

Hi @alex-gable, thanks for your patience. I've provided a fix in the develop branch if you wouldn't mind testing it out? Essentially I needed to ensure that the mutations are always evaluated in the correct enclosing environment. For this purpose, I introduce a new context variable group_env which I populate when mutate.grouped_data() is called, and remove it when mutate.grouped_data() exits.