tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 417 forks source link

Allow unnest with list columns of differing lengths? #328

Closed karldw closed 6 years ago

karldw commented 7 years ago

unnest currently can't handle multiple list columns with different lengths. If the user requests an unnesting of one list column from a dataframe with multiple, unnest will fail if the number of elements differs. Would it be possible for unnest to copy the other list columns, just as it copies values from standard, atomic columns?

In particular, the current behavior means unnest doesn't work with sf data, since the geometry column is already a list column.


library(sf)
library(dplyr)
library(tidyr)
nc <- st_read(system.file("shape/nc.shp", package = "sf")) %>% 
  slice(1:3) %>%
  select(NAME) %>%
  mutate(y = strsplit(c("a", "d,e,f", "g,h"), ","))

nc
#> Simple feature collection with 3 features and 2 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
#> # A tibble: 3 x 3
#>        NAME         y          geometry
#>      <fctr>    <list>  <simple_feature>
#> 1      Ashe <chr [1]> <MULTIPOLYGON...>
#> 2 Alleghany <chr [3]> <MULTIPOLYGON...>
#> 3     Surry <chr [2]> <MULTIPOLYGON...>

# Current behavior:
unnest(nc, y, .drop = FALSE)
#>  Error: All nested columns must have the same number of elements.

# Expected behavior: values in geometry column copied for the newly-created rows
unnest(nc, y, .drop = FALSE)
#> # A tibble: 6 x 2
#>        NAME     y           geometry
#>      <fctr> <chr>   <simple_feature>
#> 1      Ashe     a  <MULTIPOLYGON...>
#> 2 Alleghany     d  <MULTIPOLYGON...>
#> 3 Alleghany     e  <MULTIPOLYGON...>
#> 4 Alleghany     f  <MULTIPOLYGON...>
#> 5     Surry     g  <MULTIPOLYGON...>
#> 6     Surry     h  <MULTIPOLYGON...>

Ref: https://github.com/r-spatial/sf/issues/426

hadley commented 6 years ago

Oooh good idea!

hadley commented 6 years ago

I think this will need some extra syntax, maybe something like unnest(df, y, preserve = x)

karldw commented 6 years ago

Thank you! I think there's still an issue when the list variables aren't specified, but .preserve is.

I think the solution is to add something like:

if (is_empty(quos)) {
  list_cols <- names(data)[map_lgl(data, is_list)]
  list_cols <- tidyselect::vars_select(list_cols, -!!! enquo(.preserve))  # deselect .preserve vars
  quos <- syms(list_cols)
}

but the line above is wrong because I don't have the unquotation right.