tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.39k stars 420 forks source link

Should `unnest_longer()` have `keep_empty`? #1339

Closed DavisVaughan closed 1 year ago

DavisVaughan commented 2 years ago

From https://community.rstudio.com/t/unnest-longer-drops-lists-rows-with-character-0/132748

Original example:

library(tidyverse)

my_df <- tibble(
  txt = c(
    "chestnut, pear, kiwi, peanut",
    "grapes, banana"
  )
)

#Extract all nuts
my_df <- my_df %>% 
  mutate(nuts = str_extract_all(txt, regex("\\w*nut\\w*"))) %>% 
  mutate(index = row_number(), .before=1)

#Row index 2 has nuts <chr [0]>
my_df
#> # A tibble: 2 × 3
#>   index txt                          nuts     
#>   <int> <chr>                        <list>   
#> 1     1 chestnut, pear, kiwi, peanut <chr [2]>
#> 2     2 grapes, banana               <chr [0]>

#unnest
my_df_long <- my_df %>% 
  unnest_longer(nuts, values_to = "nuts_long")

#Row index 2 is now missing
my_df_long
#> # A tibble: 2 × 3
#>   index txt                          nuts_long
#>   <int> <chr>                        <chr>    
#> 1     1 chestnut, pear, kiwi, peanut chestnut 
#> 2     1 chestnut, pear, kiwi, peanut peanut

Created on 2022-03-28 by the reprex package (v2.0.1)

Minimal reprex:

library(tidyverse)

df <- tibble(
  x = list("a", character())
)
df
#> # A tibble: 2 × 1
#>   x        
#>   <list>   
#> 1 <chr [1]>
#> 2 <chr [0]>

unnest_longer(df, x)
#> # A tibble: 1 × 1
#>   x    
#>   <chr>
#> 1 a

unnest(df, x, keep_empty = TRUE)
#> # A tibble: 2 × 1
#>   x    
#>   <chr>
#> 1 a    
#> 2 <NA>

Created on 2022-03-28 by the reprex package (v2.0.1)

It may be as simple as passing keep_empty through to the unchop() call in unnest_longer(), but I'd need to think about it critically to make sure

hadley commented 2 years ago

I just noticed this absence while writing for R4DS, so I definitely think we should add keep_empty support.

hadley commented 2 years ago

It's worth noting the behaviour is different with NULL:

library(tidyr)

df <- tibble(
  x1 = list("a", character()),
  x2 = list("a", NULL)
)
df |> unnest_longer(x1)
#> # A tibble: 1 × 2
#>   x1    x2       
#>   <chr> <list>   
#> 1 a     <chr [1]>
df |> unnest_longer(x2)
#> # A tibble: 2 × 2
#>   x1        x2   
#>   <list>    <chr>
#> 1 <chr [1]> a    
#> 2 <chr [0]> <NA>

Created on 2022-08-10 by the reprex package (v2.0.1)

This is somewhat related to the strict argument to unnest_wider(), because the motivation for handling NULL in this way comes from JSON.