tidyverse / magrittr

Improve the readability of R code with the pipe
https://magrittr.tidyverse.org
Other
957 stars 157 forks source link

[Breaking Change] error when modifying an environment inside a pipeline #242

Closed DanChaltiel closed 3 years ago

DanChaltiel commented 3 years ago

When you modify an environment variable from inside a pipeline, it seems that the modification only occurs at the end of the pipeline.

For instance, if I set two functions, one for setting the variable and the other for getting it, and I call them on the same pipeline, the getter will only consider the environment as it was before the pipeline was called.

Here is an overcomplicated reprex:

# remotes::install_version("magrittr", version = "2.0.1")
library(dplyr, warn.conflicts=FALSE)
my_env = rlang::env()

save_sum = function(df, x){
    .sum = sum(df[[x]])
    print(.sum)
    my_env$sum=.sum
    invisible(df)
}

import_sum = function(df){
    if(is.null(my_env$sum))
        warning("env variable is null")
    df$sum = my_env$sum
    df
}

mtcars %>%
    save_sum("mpg") %>% 
    transmute(disp=as.numeric(disp)+1) %>%
    import_sum() %>% 
    head(2)
#> Warning in import_sum(.): env variable is null
#> [1] 642.9
#>                   disp
#> Mazda RX4          161
#> Mazda RX4 Wag      161

mtcars %>%
    save_sum("cyl") %>% 
    transmute(disp=as.numeric(disp)+1) %>%
    import_sum() %>% 
    head(2)
#> [1] 198
#>                   disp   sum
#> Mazda RX4          161 642.9
#> Mazda RX4 Wag      161 642.9

Created on 2021-01-30 by the reprex package (v0.3.0)

As you can see, the first pipeline failed to get the environment variable and returned NULL, so that mutate didn't add a column. The second pipeline added the wrong column as it was based on the previous state of the environment, so it added the sum of mpg instead of the one of cyl.

Separating the two pipelines solves the problem:

#not repeating all the previous code, but you get it
mtcars %>% save_sum("mpg") 
#> [1] 642.9
mtcars %>% 
    transmute(disp=as.numeric(disp)+1) %>%
    import_sum() %>% 
    head(2)
#>               disp   sum
#> Mazda RX4      161 642.9
#> Mazda RX4 Wag  161 642.9

Created on 2021-01-30 by the reprex package (v0.3.0)

However, this worked perfectly with magrittr v1.5:

# remotes::install_version("magrittr", version = "1.5")
library(dplyr, warn.conflicts=FALSE)
my_env = rlang::env()

save_sum = function(df, x){
    my_env$sum=sum(df[[x]])
    df
}

import_sum = function(df){
    if(is.null(my_env$sum))
        warning("env variable is null")
    df$sum = my_env$sum
    df
}

mtcars %>%
    save_sum("mpg") %>% 
    transmute(disp=as.numeric(disp)+1) %>%
    import_sum() %>% 
    head(2)
#>                   disp   sum
#> Mazda RX4          161 642.9
#> Mazda RX4 Wag      161 642.9

Created on 2021-01-30 by the reprex package (v0.3.0)

Is there any chance that it is correctable?

Companion SO question: https://stackoverflow.com/questions/65924644/package-environments-are-not-working-as-expected-in-a-pipeline

lionel- commented 3 years ago

This is by design. You need to force() arguments to force sequential evaluation, see https://www.tidyverse.org/blog/2020/11/magrittr-2-0-is-here/#sequential-evaluation. In your case this would fix it:

import_sum <- function(df){
    force(df)

    if(is.null(my_env$sum)) {
        warning("env variable is null")
    }

    df$sum <- my_env$sum
    df
}
DanChaltiel commented 3 years ago

@lionel- Thanks a lot, this works!

Sorry I missed the information, I was mostly looking at NEWS.md, which I think is a bit less clear than the blog post you just provided. You might want to at least put "Sequential evaluation" in the title.

If you want some well deserved SO points, I will gladly accept your answer there as well.

lionel- commented 3 years ago

Can you provide the answer on SO please? Thanks!

DanChaltiel commented 3 years ago

@lionel- Done, thank you very much