tidyverse / purrr

A functional programming toolkit for R
https://purrr.tidyverse.org/
Other
1.27k stars 274 forks source link

map_dfc() fails when the result contains list columns #376

Closed yutannihilation closed 2 years ago

yutannihilation commented 7 years ago

Sometimes we have a data.frame-like list and want to apply some function and harvest the result as data.frame. map_dfc() is quite useful for this purpose:

not_nested <- list(
  col1 = c("Apple", "Orange"),
  col2 = c("Baseball", "Football")
)

purrr::map_dfc(not_nested, purrr::map_chr, sprintf, fmt = "I like %s")
#> # A tibble: 2 x 2
#>            col1            col2
#>           <chr>           <chr>
#> 1  I like Apple I like Baseball
#> 2 I like Orange I like Football

But it fails if the result contains list columns:

nested <- list(
  col1 = list(
    c("Apple", "Banana"),
    c("Orange")
  ),
  col2 = list(
    c("Baseball", "Soccer"),
    c("Football")
  )
)

purrr::map_dfc(nested, purrr::map, sprintf, fmt = "I like %s")
#> Error in cbind_all(x): Argument 2 must be length 2, not 1

Is it possible to get the result as data.frame like bellow?

tibble::as_tibble(
  purrr::map(nested, purrr::map, sprintf, fmt = "I like %s")
)
#> # A tibble: 2 x 2
#>        col1      col2
#>      <list>    <list>
#> 1 <chr [2]> <chr [2]>
#> 2 <chr [1]> <chr [1]>

(As the error indicates, this seems up to cbind_all(). Should I file this issue to dplyr's repo?)

lionel- commented 5 years ago

The problem is that bind_cols() and bind_rows() implicitly splice inputs:

dplyr::bind_cols(y = list(3, 4))
#> # A tibble: 1 x 2
#>      V1    V2
#>   <dbl> <dbl>
#> 1     3     4

dplyr::bind_cols(list(y = 3:4, z = 4:5))
#> # A tibble: 2 x 2
#>       y     z
#>   <int> <int>
#> 1     3     4
#> 2     4     5

So list-columns are spliced as well unless we wrap all inputs:

dplyr::bind_cols(x = 1:2, y = list(3, 4))
#> Error: Argument 2 must be length 2, not 1

dplyr::bind_cols(list(x = 1:2, y = list(3, 4)))
#> # A tibble: 2 x 2
#>       x y
#>   <int> <list>
#> 1     1 <dbl [1]>
#> 2     2 <dbl [1]>

These functions need to replaced with versions that use explicit splicing with !!! before they can support list-columns.

In the meantime, we can wrap list-columns in lists to protect them. This might break existing code, but I think it's worth moving towards a consistent handling of lists in data frames.

lionel- commented 5 years ago

Will need to wait until a vctrs replacement for bind_cols() / bind_rows().

hadley commented 2 years ago

I think we can probably now switch from dplyr::bind_cols() to vctrs::vec_cbind():

library(purrr)

nested <- list(
  col1 = list(
    c("Apple", "Banana"),
    c("Orange")
  ),
  col2 = list(
    c("Baseball", "Soccer"),
    c("Football")
  )
)

str(vctrs::vec_cbind(!!!map(nested, map, sprintf, fmt = "I like %s")))
#> 'data.frame':    2 obs. of  2 variables:
#>  $ col1:List of 2
#>   ..$ : chr  "I like Apple" "I like Banana"
#>   ..$ : chr "I like Orange"
#>  $ col2:List of 2
#>   ..$ : chr  "I like Baseball" "I like Soccer"
#>   ..$ : chr "I like Football"

Created on 2022-08-24 by the reprex package (v2.0.1)