markfairbanks / tidytable

Tidy interface to 'data.table'
https://markfairbanks.github.io/tidytable/
Other
449 stars 33 forks source link

`data.frame` expansion inside `mutate()` #700

Closed Darxor closed 6 months ago

Darxor commented 1 year ago

Right now, when a data.frame (usually it would be a data.frame constructor like map_dfr()) is passed as an argument to mutate() it is not expanded, but rather appended as a list-column. This works fine in summarise(), though.

{dplyr} expands the data.frame, checks vector sizes and overwrites existing columns if necessary.

Not sure if this is attainable with {data.table} backend without hacky code, but it is something I find myself using more with {dplyr} lately.

library(tidytable)
df <- data.frame(a = 1, b = 2)

df |> 
  mutate(
    data.frame(
      a = 2, b = 3
    )
  )
#> # A tidytable: 1 × 3
#>       a     b `data.frame(a = 2, b = 3)`
#>   <dbl> <dbl> <list>                    
#> 1     1     2 <df [1 × 2]>

df |> 
  mutate(
    data.frame(
      c = 2, d = 3
    )
  )
#> # A tidytable: 1 × 3
#>       a     b `data.frame(c = 2, d = 3)`
#>   <dbl> <dbl> <list>                    
#> 1     1     2 <df [1 × 2]>

df |> 
  summarise(
    data.frame(
      a = 2, b = 3
    )
  )
#> # A tidytable: 1 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     2     3

df |> 
  dplyr::mutate(
    data.frame(
      a = 2, b = 3
    )
  )
#>   a b
#> 1 2 3

df |> 
  dplyr::mutate(
    data.frame(
      a = 2:3, b = 3:4
    )
  )
#> Error in `dplyr::mutate()`:
#> ! Problem while computing `..1 = data.frame(a = 2:3, b = 3:4)`.
#> ✖ `..1` must be size 1, not 2.

Created on 2022-11-30 with reprex v2.0.2

markfairbanks commented 1 year ago

This issue https://github.com/markfairbanks/tidytable/issues/576 details some discussion around this in summarize(). When .by is not used it works automatically (bc of some quirk of data.table) but it failed with .by. I ended up adding a .unpack argument to summarize() so that you can opt in to the behavior (since the solution causes performance issues).

As far as mutate() - I'm not positive I can get it to work but I'll see what I can do.

Darxor commented 1 year ago

One possible way is to parse data.frames' names and do something like code below, because this totally works in data.table.

library(data.table)
df <- data.table(a = 1, b = 2, c = 3)
df[, c("a", "b") := data.frame(a = 3, b = 4)]
markfairbanks commented 1 year ago

Hmm this seems like the code might get a bit difficult for some situations. What happens if the data.frame is passed as a variable and/or there are other steps to the mutate?

library(dplyr, w = FALSE)

df <- tibble(a = 1, b = 2)
new_cols <- tibble(c = 3, d = 4)

df %>%
  mutate(double_a = a * 2,
         new_cols)
#> # A tibble: 1 × 5
#>       a     b double_a     c     d
#>   <dbl> <dbl>    <dbl> <dbl> <dbl>
#> 1     1     2        2     3     4
markfairbanks commented 1 year ago

As a side note as we work through this - here's the workaround in the meantime. You can splice a data frame using !!! inside mutate().

library(tidytable, w = FALSE)

df <- data.frame(a = 1, b = 2)

new_cols <- data.frame(a = 2, b = 3)

df |> 
  mutate(!!!new_cols)
#> # A tidytable: 1 × 2
#>       a     b
#>   <dbl> <dbl>
#> 1     2     3
markfairbanks commented 6 months ago

I've decided this isn't worth the effort to get to work (although it is possible). It would add a bunch of checks for a relatively niche functionality.

There's a checklist in #802 to show everything that would have to be covered. I started the process but I don't think it's quite worth it.