tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 418 forks source link

dplyr RC 0.8.0: nest() creates unexpected output (e.g. drops rows), when excluding a factor variable from the nest command, that has levels missing. #542

Closed tilltnet closed 5 years ago

tilltnet commented 5 years ago

nest() creates unexpected output (e.g. drops rows), when excluding a factor variable from the nest command, that has levels missing.

This occurred with the development version/ RC of dplyr (0.8.0). I updated to the current development version of tidyr, but the problem prevails. I'm pretty confident, that with prior versions of dplyr this was not a problem - don't know if it's a dplyr or a tidyr problem, but since the nest() command belongs to tidyr I filed the issue here.

library(dplyr)
library(tidyr)
t_df <- tibble(a = gl(3,3,9),
       b = 1:9,
       c = 9:1)

t_df %>% 
  filter(a != 1) %>% 
  nest(-a)
#> # A tibble: 2 x 2
#>   a     data            
#>   <fct> <list>          
#> 1 2     <tibble [3 x 2]>
#> 2 3     <NULL>

t_df %>% 
  filter(a != 1) %>% 
  nest(-a, -b)
#> # A tibble: 6 x 3
#>   a         b data            
#>   <fct> <int> <list>          
#> 1 2         4 <tibble [1 x 1]>
#> 2 2         5 <tibble [1 x 1]>
#> 3 2         6 <tibble [1 x 1]>
#> 4 3         7 <tibble [1 x 1]>
#> 5 3         8 <tibble [1 x 1]>
#> 6 3         9 <NULL>

version

> _

> platform x86_64-w64-mingw32

> arch x86_64

> os mingw32

> system x86_64, mingw32

> status

> major 3

> minor 5.2

> year 2018

> month 12

> day 20

> svn rev 75870

> language R

> version.string R version 3.5.2 (2018-12-20)

> nickname Eggshell Igloo

batpigandme commented 5 years ago

@romainfrancois, could this be related to drop = TRUE?

romainfrancois commented 5 years ago

I get this with:

library(dplyr)
library(tidyr)
t_df <- tibble(a = gl(3,3,9),
  b = 1:9,
  c = 9:1)

t_df %>% 
  filter(a != 1) %>% 
  nest(-a)
#> # A tibble: 2 x 2
#>   a     data            
#>   <fct> <list>          
#> 1 2     <tibble [3 × 2]>
#> 2 3     <tibble [3 × 2]>

# same as 
t_df %>% 
  filter(a != 1) %>% 
  group_nest(a)
#> # A tibble: 2 x 2
#>   a     data            
#>   <fct> <list>          
#> 1 2     <tibble [3 × 2]>
#> 2 3     <tibble [3 × 2]>

t_df %>% 
  filter(a != 1) %>% 
  nest(-a, -b)
#> # A tibble: 6 x 3
#>   a         b data            
#>   <fct> <int> <list>          
#> 1 2         4 <tibble [1 × 1]>
#> 2 2         5 <tibble [1 × 1]>
#> 3 2         6 <tibble [1 × 1]>
#> 4 3         7 <tibble [1 × 1]>
#> 5 3         8 <tibble [1 × 1]>
#> 6 3         9 <tibble [1 × 1]>

# same as
t_df %>% 
  filter(a != 1) %>% 
  group_nest(a, b)
#> # A tibble: 6 x 3
#>   a         b data            
#>   <fct> <int> <list>          
#> 1 2         4 <tibble [1 × 1]>
#> 2 2         5 <tibble [1 × 1]>
#> 3 2         6 <tibble [1 × 1]>
#> 4 3         7 <tibble [1 × 1]>
#> 5 3         8 <tibble [1 × 1]>
#> 6 3         9 <tibble [1 × 1]>
hadley commented 5 years ago

Should no longer be a problem since we're reverted the dropping behaviour in dev dplyr.

mattbk commented 5 years ago

Noting this issue with grouping variables that are factors. If changed to character the nest() works as expected and does not return NULL.