tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 417 forks source link

Adding group names when unnesting named lists of lists #387

Closed aosmith16 closed 5 years ago

aosmith16 commented 6 years ago

Now that unnest handles lists of lists, should .id be able to use the names in the nested list as identifiers?

library(tidyr)
library(tibble)

df = tibble(
     x = 1:2,
     y = list(
          list(a = "a", b = "b"),
          list(a = "c", b = "d")
     )
)

df %>% unnest(y, .id = "group")
# # A tibble: 4 x 3
#       x         y group
#   <int>    <list> <int>
# 1     1 <chr [1]>     1
# 2     1 <chr [1]>     1
# 3     2 <chr [1]>     2
# 4     2 <chr [1]>     2

# Expected (?) output
tibble(
     x = c(1, 1, 2, 2),
     y = list("a", "b", "c", "d"),
     group = c("a", "b", "a", "b")
)
# # A tibble: 4 x 3
#       x         y group
#   <dbl>    <list> <chr>
# 1     1 <chr [1]>     a
# 2     1 <chr [1]>     b
# 3     2 <chr [1]>     a
# 4     2 <chr [1]>     b
hadley commented 5 years ago

Hmmmm .id doesn't seem quite the right way to describe what you want because it's not actually the group, it's something inside the graph.

Is this the final product that you want, or would you prefer something like this?

tibble::tibble(
  x = c(1, 2),
  a = c("a", "c"),
  b = c("b", "d")
)
#> # A tibble: 2 x 3
#>       x a     b    
#>   <dbl> <chr> <chr>
#> 1     1 a     b    
#> 2     2 c     d

Created on 2019-03-08 by the reprex package (v0.2.1.9000)

aosmith16 commented 5 years ago

Looking at this now I agree that .id doesn't seem quite right.

I think my concern really boils down to losing the list names after unnest(). Keeping the list names as column names seems like a fine approach, since things could be reshaped into a long dataset if needed.

I'm guessing I was expecting the list names to be preserved after unnest(), so that if I did back-to-back unnests on my original example I would get the same result as unnesting a single named list.

So this:

library(tidyr)
library(tibble)

# Nested, named lists
df1 = tibble(
    x = 1:2,
    y = list(
        list(a = "a", b = "b"),
        list(a = "c", b = "d")
    )
)

df1 %>% 
    unnest(y) %>% 
    unnest(y, .id = "group") # names stripped after first unnest
#> # A tibble: 4 x 3
#>       x y     group
#>   <int> <chr> <int>
#> 1     1 a         1
#> 2     1 b         2
#> 3     2 c         3
#> 4     2 d         4

Would be equivalent to:

# Single named list
df2 = tibble(
    x = c(1, 1, 2, 2),
    y = list(a = "a", b = "b", a = "c", b = "d")
)

df2 %>% unnest(y, .id = "group")
#> # A tibble: 4 x 3
#>       x y     group
#>   <dbl> <chr> <chr>
#> 1     1 a     a    
#> 2     1 b     b    
#> 3     2 c     a    
#> 4     2 d     b

Created on 2019-03-08 by the reprex package (v0.2.0).

hadley commented 5 years ago

Playing around with new experimental verbs:

library(tidyr)

df <- tibble(
  x = 1:2,
  y = list(
    list(a = "a", b = "b"),
    list(a = "c", b = "d")
  )
)
df %>% unpack(y)
#> # A tibble: 2 x 3
#>       x a     b    
#>   <int> <chr> <chr>
#> 1     1 a     b    
#> 2     2 c     d
df %>% unchop(y, id = "y_name")
#> # A tibble: 4 x 3
#>       x y         y_name
#>   <int> <list>    <chr> 
#> 1     1 <chr [1]> a     
#> 2     1 <chr [1]> b     
#> 3     2 <chr [1]> a     
#> 4     2 <chr [1]> b

# And I think unnest(y) should be equivalent to this
df %>% unchop(y, id = "y_name") %>% unpack(y)
#> # A tibble: 4 x 3
#>       x y1    y_name
#>   <int> <chr> <chr> 
#> 1     1 a     a     
#> 2     1 b     b     
#> 3     2 c     a     
#> 4     2 d     b

Created on 2019-03-28 by the reprex package (v0.2.1.9000)

hadley commented 5 years ago

Now implemented in unnest_longer() and unnest_wider():

library(tidyr)

df <- tibble(
  x = 1:2,
  y = list(
    list(a = "a", b = "b"),
    list(a = "c", b = "d")
  )
)

df %>% unnest_wider(y)
#> # A tibble: 2 x 3
#>       x a     b    
#>   <int> <chr> <chr>
#> 1     1 a     b    
#> 2     2 c     d

df %>% unnest_longer(y)
#> # A tibble: 4 x 3
#>       x values    index
#>   <int> <list>    <chr>
#> 1     1 <chr [1]> a    
#> 2     1 <chr [1]> b    
#> 3     2 <chr [1]> a    
#> 4     2 <chr [1]> b
df %>% unnest_longer(y, value_to = "y", index_to = "y_name")
#> # A tibble: 4 x 3
#>       x y         y_name
#>   <int> <list>    <chr> 
#> 1     1 <chr [1]> a     
#> 2     1 <chr [1]> b     
#> 3     2 <chr [1]> a     
#> 4     2 <chr [1]> b

Created on 2019-04-23 by the reprex package (v0.2.1.9000)