tidyverse / magrittr

Improve the readability of R code with the pipe
https://magrittr.tidyverse.org
Other
961 stars 157 forks source link

Function for mutating pieces of an object in the middle of a pipe? #142

Closed DarwinAwardWinner closed 6 years ago

DarwinAwardWinner commented 7 years ago

Sometimes I want to mutate some piece of an object in the middle of a pipe, e.g.:

some_subelement <- "subelement_name"
i <- 5
temp <- x %>% some_function
# Need a temp variable to assign to this piece of the object
temp$some_element[[some_subelement]][i] <- 3
y <- temp %>% another_function

Simple cases for data frames may be covered by dplyr::mutate, but in the general case, if you wanted to write this as a single pipeline, you'd have to do something like:

y <- x %>% 
    some_function %>% 
    {.$some_element[[some_subelement]][i] <- 3; .} %>% 
    another_function

but that's kind of an awkward construction that might not be self-explanatory. Worse, if someone doesn't understand what that construction is doing, there's no keyword or function they can look up to figure it out. One could encapsulate that pattern info a function:

assign_into <- function(x, expr, value) {
    expr <- lazy(expr)$expr
    f_eval(f_interp(~ x %>% { uq(expr) <- uq(value); . }))
}

and use it like:

y <- x %>% 
    some_function %>% 
    assign_into(.$some_element[[some_subelement]][i], 3) %>% 
    another_function

which both reads better and gives a confused reader something to look up. The function should probably do some check to verify that the expression passed to it at least mentions ., since currently there's nothing to prevent the user from doing x %>% assign_into(somevar, 5), which would be somewhere between a no-op and an error.

Another even more general option would be to have a different pipe operator that takes any expression involving . as the rhs, evaluates it under the assumption that it mutates the . object, and then returns the value of ., rather than the normal pipe which returns the value of the expression. I think I favor the assign_into function more, though, because it has a specific, well-defined purpose and because it aids discoverability more.

So do you think this is something worth including? Or is ... %>% { mutate_code(.); .} %>% ... already good enough?

(Note: I'm not very experienced with the lazyeval package, so the code above might not be 100% correct.)

hadley commented 6 years ago

Mutating a piece of an object in the middle of the pipe seems a bit icky to me. Could you provide a more realistic (i.e. compelling) example?

DarwinAwardWinner commented 6 years ago

The most common reason I find is when I need to modify pieces of S3 and S4 objects inline. Here's part of an analysis that does this quite a bit: https://github.com/DarwinAwardWinner/CD4-csaw/blob/3447e5d6199f88ca40a9476f0ea85446c904527e/scripts/chipseq-tsshood-explore-H3K27me3.Rmd (search for "assign_into").

Also, I came up with a possibly better syntax for this, which you can see (with documentation) here: https://github.com/DarwinAwardWinner/rctutils/blob/fb0e767a9359b16f2f0c68be4cd0804f90cd4d0a/R/prog_utils.R#L77-L123

hadley commented 6 years ago

I do like the use of assign_into() here:

dge %>%
    assign_into(.$offset, NULL) %>%
    assign_into(.$genes, all.window.meta) 

But I think you could rewrite with within:

dge %>% within({ 
  offset <- NULL
  genes <- all.window.meta
})

Or if dplyr had a list backend you could write:

dge %>% mutate(offset = NULL, genes = all.window.meta)

All-in-all that makes me feel that this doesn't belong in magrittr. I think it's useful, it just more clearly feels like it belongs somewhere else (like a general package for manipulating lists/vectors).