Closed mkoohafkan closed 5 years ago
Seems like to be the same issue as #542 but the issue is not fixed with the development version of dplr. Session info below:
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.0.9000 purrr_0.3.0
[5] readr_1.3.1 tidyr_0.8.2 tibble_2.0.1 ggplot2_3.1.0
[9] tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 remotes_2.0.2 haven_2.1.0 lattice_0.20-38
[5] colorspace_1.4-0 generics_0.0.2 testthat_2.0.1 usethis_1.4.0
[9] utf8_1.1.4 rlang_0.3.1 pkgbuild_1.0.2 pillar_1.3.1
[13] glue_1.3.0 withr_2.1.2 modelr_0.1.4 sessioninfo_1.1.1
[17] readxl_1.3.0 plyr_1.8.4 munsell_0.5.0 gtable_0.2.0
[21] cellranger_1.1.0 rvest_0.3.2 devtools_2.0.1 memoise_1.1.0
[25] callr_3.1.1 ps_1.3.0 curl_3.3 fansi_0.4.0
[29] broom_0.5.1 Rcpp_1.0.0 backports_1.1.3 scales_1.0.0
[33] desc_1.2.0 pkgload_1.0.2 jsonlite_1.6 fs_1.2.6
[37] hms_0.4.2 digest_0.6.18 stringi_1.3.1 processx_3.2.1
[41] grid_3.5.2 rprojroot_1.3-2 cli_1.0.1 tools_3.5.2
[45] magrittr_1.5 lazyeval_0.2.1 crayon_1.3.4 pkgconfig_2.0.2
[49] xml2_1.2.0 prettyunits_1.0.2 lubridate_1.7.4 rstudioapi_0.9.0
[53] assertthat_0.2.0 httr_1.4.0 R6_2.4.0 nlme_3.1-137
[57] compiler_3.5.2
Note that this does not happen using forcats::as_factor()
as opposed to base factor()
which you're using above (the warning is just because I didn't take out the factor level args):
library(tidyverse)
mtcars %>% as_tibble() %>% mutate(cyl2 = as_factor(cyl, c(4L, 6L, 8L))) %>% nest(-cyl2)
#> Warning: Some components of ... were not used: ..1
#> # A tibble: 3 x 2
#> cyl2 data
#> <fct> <list>
#> 1 6 <tibble [7 × 11]>
#> 2 4 <tibble [11 × 11]>
#> 3 8 <tibble [14 × 11]>
mtcars %>% as_tibble() %>% mutate(cyl2 = as_factor(cyl, c(2L, 4L, 6L, 8L))) %>% nest(-cyl2)
#> Warning: Some components of ... were not used: ..1
#> # A tibble: 3 x 2
#> cyl2 data
#> <fct> <list>
#> 1 6 <tibble [7 × 11]>
#> 2 4 <tibble [11 × 11]>
#> 3 8 <tibble [14 × 11]>
mtcars %>% as_tibble() %>% mutate(cyl2 = as_factor(cyl, c(2L, 4L, 6L, 8L))) %>% nest(-cyl2) %>% slice(1)
#> Warning: Some components of ... were not used: ..1
#> # A tibble: 1 x 2
#> cyl2 data
#> <fct> <list>
#> 1 6 <tibble [7 × 11]>
mtcars %>% as_tibble() %>% mutate(cyl2 = as_factor(cyl, c(2L, 4L, 6L, 8L))) %>% nest(-cyl2) %>% slice(1) %>% unnest(data)
#> Warning: Some components of ... were not used: ..1
#> # A tibble: 7 x 12
#> cyl2 mpg cyl disp hp drat wt qsec vs am gear carb
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 6 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 6 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 6 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 4 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 5 6 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> 6 6 17.8 6 168. 123 3.92 3.44 18.9 1 0 4 4
#> 7 6 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
Created on 2019-02-19 by the reprex package (v0.2.1.9000)
However, applying forcats::as_factor()
to an existing factor variable doesn't seem to fix the issue. Using a larger dataset (e.g., gapminder
), the first NULL
value doesn't show up until row 30...
library(tidyverse)
library(gapminder)
str(gapminder)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 1704 obs. of 6 variables:
#> $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
#> $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
#> $ lifeExp : num 28.8 30.3 32 34 36.1 ...
#> $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
#> $ gdpPercap: num 779 821 853 836 740 ...
gapminder %>%
mutate_at(vars(country, continent),
list(name = ~as_factor(.))) %>%
nest(-country, -continent) %>%
slice(25:34)
#> # A tibble: 10 x 3
#> country continent data
#> <fct> <fct> <list>
#> 1 China Asia <tibble [12 × 6]>
#> 2 Colombia Americas <tibble [12 × 6]>
#> 3 Comoros Africa <tibble [12 × 6]>
#> 4 Congo, Dem. Rep. Africa <tibble [12 × 6]>
#> 5 Congo, Rep. Africa <tibble [12 × 6]>
#> 6 Costa Rica Americas <NULL>
#> 7 Cote d'Ivoire Africa <NULL>
#> 8 Croatia Europe <NULL>
#> 9 Cuba Americas <NULL>
#> 10 Czech Republic Europe <NULL>
Created on 2019-02-19 by the reprex package (v0.2.1)
All work for me if you install https://github.com/tidyverse/tidyr/pull/511, if you want to try it out.
remotes::install_github("tidyverse/tidyr#511")
library(tidyverse)
mtcars %>% as_tibble() %>% mutate(cyl2 = factor(cyl, c(2L, 4L, 6L, 8L))) %>% nest(-cyl2)
#> # A tibble: 3 x 2
#> cyl2 data
#> <fct> <list>
#> 1 4 <tibble [11 × 11]>
#> 2 6 <tibble [7 × 11]>
#> 3 8 <tibble [14 × 11]>
mtcars %>% as_tibble() %>% mutate(cyl2 = factor(cyl, c(2L, 4L, 6L, 8L))) %>% nest(-cyl2) %>% slice(1) %>% unnest(data)
#> # A tibble: 11 x 12
#> cyl2 mpg cyl disp hp drat wt qsec vs am gear carb
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 2 4 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 3 4 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 4 4 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
#> 5 4 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
#> 6 4 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
#> 7 4 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1
#> 8 4 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
#> 9 4 26 4 120. 91 4.43 2.14 16.7 0 1 5 2
#> 10 4 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
#> 11 4 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
Created on 2019-02-21 by the reprex package (v0.2.1.9000)
This works fine:
However, this does not:
When there are additional levels in a factor for which there is no data, attempting to nest by this factor introduces
NULL
values. Even worse, the nesting IDs do not match what actually gets nested:Note that
cyl
andcyl2
do not match in the second case.