Closed DarwinAwardWinner closed 6 years ago
Mutating a piece of an object in the middle of the pipe seems a bit icky to me. Could you provide a more realistic (i.e. compelling) example?
The most common reason I find is when I need to modify pieces of S3 and S4 objects inline. Here's part of an analysis that does this quite a bit: https://github.com/DarwinAwardWinner/CD4-csaw/blob/3447e5d6199f88ca40a9476f0ea85446c904527e/scripts/chipseq-tsshood-explore-H3K27me3.Rmd (search for "assign_into").
Also, I came up with a possibly better syntax for this, which you can see (with documentation) here: https://github.com/DarwinAwardWinner/rctutils/blob/fb0e767a9359b16f2f0c68be4cd0804f90cd4d0a/R/prog_utils.R#L77-L123
I do like the use of assign_into()
here:
dge %>%
assign_into(.$offset, NULL) %>%
assign_into(.$genes, all.window.meta)
But I think you could rewrite with within:
dge %>% within({
offset <- NULL
genes <- all.window.meta
})
Or if dplyr had a list backend you could write:
dge %>% mutate(offset = NULL, genes = all.window.meta)
All-in-all that makes me feel that this doesn't belong in magrittr. I think it's useful, it just more clearly feels like it belongs somewhere else (like a general package for manipulating lists/vectors).
Sometimes I want to mutate some piece of an object in the middle of a pipe, e.g.:
Simple cases for data frames may be covered by
dplyr::mutate
, but in the general case, if you wanted to write this as a single pipeline, you'd have to do something like:but that's kind of an awkward construction that might not be self-explanatory. Worse, if someone doesn't understand what that construction is doing, there's no keyword or function they can look up to figure it out. One could encapsulate that pattern info a function:
and use it like:
which both reads better and gives a confused reader something to look up. The function should probably do some check to verify that the expression passed to it at least mentions
.
, since currently there's nothing to prevent the user from doingx %>% assign_into(somevar, 5)
, which would be somewhere between a no-op and an error.Another even more general option would be to have a different pipe operator that takes any expression involving
.
as the rhs, evaluates it under the assumption that it mutates the.
object, and then returns the value of.
, rather than the normal pipe which returns the value of the expression. I think I favor theassign_into
function more, though, because it has a specific, well-defined purpose and because it aids discoverability more.So do you think this is something worth including? Or is
... %>% { mutate_code(.); .} %>% ...
already good enough?(Note: I'm not very experienced with the lazyeval package, so the code above might not be 100% correct.)