mikemahoney218 / heddlr

Bring a functional programming mindset to R Markdown document generation
https://mikemahoney218.github.io/heddlr/
Other
27 stars 1 forks source link

heddle() should be able to replace multiple placeholders when given a vector #14

Closed mikemahoney218 closed 4 years ago

mikemahoney218 commented 4 years ago

Right now, heddle() is able to replace multiple placeholders when provided a dataframe via named arguments:

heddle(iris, "placeholder1 placeholder2 ", "placeholder1" = Species, "placeholder2" = Species) 

Similarly, heddle() is able to replace multiple placeholders with a single vector in a single call to mutate:

iris %>%
  distinct(Species) %>%
  mutate(component = heddle(Species, "placeholder1 placeholder2", "placeholder1", "placeholder2"))

However, the vector equivalent falls down in a frankly bizarre manner when you try to use the same syntax as with dataframes:

library(dplyr)
library(heddlr)
iris %>%
  distinct(Species) %>%
  mutate(test = "a",
         component = heddle(Species, "placeholder1 placeholder2", "placeholder1" = Species, "placeholder2" = test))
#>      Species test
#> 1     setosa    a
#> 2 versicolor    a
#> 3  virginica    a
#>                                                       component
#> 1                 plsetosaceholdersetosa plsetosaceholdersetosa
#> 2 plversicolorceholderversicolor plversicolorceholderversicolor
#> 3     plvirginicaceholdervirginica plvirginicaceholdervirginica
reprex::reprex()
#> No input provided and clipboard is not available.
#> Rendering reprex...

Created on 2019-12-29 by the reprex package (v0.3.0)```

There's a better alternative currently supported via tidyr and purrr, which looks like this:

library(dplyr)
library(tidyr)
library(purrr)
library(heddlr)
iris %>%
  distinct(Species) %>%
  mutate(test = "a") %>%
  nest(nested = c(Species, test)) %>%
  mutate(component = map(nested, heddle, "placeholder1 placeholder2 ", "placeholder1" = Species, "placeholder2" = test)) %>%
  make_template(component)

But it would be nice to not require two additional dependencies (and knowledge of purrr, no matter how essential I find that package) in order to do something this simple. It would be a great feature to support replacing multiple placeholders in a mutate call -- most likely by removing the data argument and instead catching variables in ..., requiring arguments to be named the same way as they are in dataframe calls and potentially dropping support for vectors given outside mutate calls, which wouldn't be a good tradeoff.

mikemahoney218 commented 4 years ago

:man_facepalming: so. I was using my own API wrong in this issue.

Looking at the motivating example:

library(dplyr)
library(heddlr)
iris %>%
  distinct(Species) %>%
  mutate(test = "a",
         component = heddle(Species, "placeholder1 placeholder2", "placeholder1" = Species, "placeholder2" = test))
#>      Species test
#> 1     setosa    a
#> 2 versicolor    a
#> 3  virginica    a
#>                                                       component
#> 1                 plsetosaceholdersetosa plsetosaceholdersetosa
#> 2 plversicolorceholderversicolor plversicolorceholderversicolor
#> 3     plvirginicaceholdervirginica plvirginicaceholdervirginica

The function in question is heddle(Species, "placeholder1 placeholder2", "placeholder1" = Species, "placeholder2" = test) -- which is to say, heddle(data, pattern, thing to be replaced). So I was only replacing a due to test being in the thing to be replaced field.

That said, it would make a lot of sense for this function to not ignore names when passed a vector, and instead use them in the same way heddle.data.frame does (well, similarly -- I think mutate handles the tidyeval for me here). Probably an unnamed vector results in the current behavior, while a named vector tries to replace with data and fails if the object isn't defined in any environment it's been told to look at

mikemahoney218 commented 4 years ago

You can also replace multiple placeholders via two calls to heddle, but that's not really a solution

library(dplyr)
library(heddlr)
iris %>%
  distinct(Species) %>%
  mutate(test = "x y",
         component = heddle(Species, test, "x"),
         component = heddle(Species, component, "y"))
mikemahoney218 commented 4 years ago

So the fundamental challenge here comes down to a few elements:

  1. I want one function with a consistent API to handle both dataframe and vector cases
  2. UseMethod() looks at the first argument to do dispatch, which will differentiate between dataframe and vector objects
  3. I use ... to signal the placeholders to replace, so can't use it for data objects
  4. The behavior of expecting a vector to exist in a dataframe within a passed environment doesn't make sense for runningheddle() outside of pipelines -- we could do some
  5. I don't want to place arguments without defaults behind a ...
  6. Most importantly, if heddle() is required to start with a data argument (both for method dispatch and to cooperate with pipes), I don't know that there's a good syntax for replacing multiple vectors at the same time -- I think heddle("x" = Species, "y" = Species2, test) breaks too far from heddle.data.frame(), which needs that first argument to be a dataframe -- so at best, I could get parity via heddle(iris, "x" = Species", test) (that is, heddle(data, ..., pattern), but that still has the issue of a required argument hiding behind ...

As such, until I become smarter, I think the supported methods are either calling heddle() twice inside a mutate call, or calling tidyr::nest() to take advantage of the dataframe API.