tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 417 forks source link

`unnest()` fails to unnest empty list, `unnest_wider()` works #823

Closed florianm closed 4 years ago

florianm commented 4 years ago

Summary

I can reproduce here that #650 still persists with tidyr_1.0.0.9000.

On a side note, tidyr_1.0.0.9000 fixes the behaviour of unnest_wider(): When unnesting a list, unnest_wider() in versions before 1.0.0.9000 seems to have trouble if a nested list in the first record is empty, but subsequent records have data in those nested list elements. I'm struggling a bit with a reproducible example here but have provided what I can below.

tidyr version 1.0.0.9000 (current master)

The example from https://github.com/tidyverse/tidyr/issues/650#issuecomment-514683131 doesn't work as expected:

library(magrittr)
library(tidyr)
#> 
#> Attaching package: 'tidyr'
#> The following object is masked from 'package:magrittr':
#> 
#>     extract
df <- tibble(x = list(), y = integer())
df %>% unnest(y)
#> Error: Input must be list of vectors
df %>% unnest(y) %>% names()
#> Error: Input must be list of vectors
df %>% unnest_wider(y)
#> # A tibble: 0 x 1
#> # … with 1 variable: x <list>
df %>% unnest_wider(y) %>% names()
#> [1] "x"

Created on 2019-12-04 by the reprex package (v0.3.0)

However, tidyr_1.0.0.9000 fixes my issue https://github.com/dbca-wa/ruODK/issues/46.

tidyr version 1.0.0 (current CRAN)

library(magrittr)
library(tidyr)
#> 
#> Attaching package: 'tidyr'
#> The following object is masked from 'package:magrittr':
#> 
#>     extract
df <- tibble(x = list(), y = integer())
df %>% unnest(y)
#> Error: Input must be list of vectors
df %>% unnest(y) %>% names()
#> Error: Input must be list of vectors
df %>% unnest_wider(y)
#> # A tibble: 0 x 1
#> # … with 1 variable: x <list>
df %>% unnest_wider(y) %>% names()
#> [1] "x"

Created on 2019-12-04 by the reprex package (v0.3.0)

Important difference: tidyr_1.0.0 causes my issue https://github.com/dbca-wa/ruODK/issues/46.

    library(ruODK)
    fresh_raw_parsed <- odata_submission_get(
      pid = get_test_pid(),
      fid = get_test_fid_gap(),
      url = get_test_url(),
      un = get_test_un(),
      pw = get_test_pw(),
      parse = TRUE,
      verbose = TRUE,
      local_dir = t
    )
#> Downloading submissions...
#> Downloaded 2 submissions.
#> Reading form schema...
#> Parsing submissions...
#> New names:
#> * `@odata.context` -> .odata.context
#> Unnesting column 'value'
#> Found more nested columns, unnesting again.
#> Unnesting column '.__system'
#> Unnesting column 'meta'
#> Unnesting column 'details'
#> Unnesting column 'track_photos'
#> Error: Can't cast `track_photos$...1` <logical> to `track_photos$...1` <vctrs_unspecified>.

Significant bit from the trace: unchop does keep_empty, but vec_bind probably has a hard time guessing the .ptype from the incomplete first record.

                                         └─tidyr::unchop(data, !!col, keep_empty = TRUE)
 24. │                                   └─vctrs::vec_rbind(!!!x, .ptype = ptype)

Error and trace in details.

``` > rlang::last_error() Can't cast `track_photos$...1` to `track_photos$...1` . Backtrace: 1. ruODK::odata_submission_get(...) 29. vctrs:::vec_cast.default(x = x, to = to, x_arg = x_arg, to_arg = to_arg) 30. vctrs::stop_incompatible_cast(x, to, x_arg = x_arg, to_arg = to_arg) 31. vctrs:::stop_incompatible(...) 32. vctrs:::stop_vctrs(...) Run `rlang::last_trace()` to see the full context. ``` ``` > rlang::last_trace() Can't cast `track_photos$...1` to `track_photos$...1` . Backtrace: █ 1. ├─ruODK::odata_submission_get(...) 2. │ └─sub %>% odata_submission_parse(form_schema = fs, verbose = verbose) R/odata_submission_get.R:217:2 3. │ ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 4. │ └─base::eval(quote(`_fseq`(`_lhs`)), env, env) 5. │ └─base::eval(quote(`_fseq`(`_lhs`)), env, env) 6. │ └─ruODK:::`_fseq`(`_lhs`) 7. │ └─magrittr::freduce(value, `_function_list`) 8. │ ├─base::withVisible(function_list[[k]](value)) 9. │ └─function_list[[k]](value) 10. │ └─ruODK::odata_submission_parse(., form_schema = fs, verbose = verbose) 11. │ └─`%>%`(...) R/odata_submission_parse.R:123:6 12. │ ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 13. │ └─base::eval(quote(`_fseq`(`_lhs`)), env, env) 14. │ └─base::eval(quote(`_fseq`(`_lhs`)), env, env) 15. │ └─ruODK:::`_fseq`(`_lhs`) 16. │ └─magrittr::freduce(value, `_function_list`) 17. │ └─function_list[[i]](value) 18. │ └─ruODK::unnest_all(., names_repair = names_repair, verbose = verbose) 19. │ └─ruODK::unnest_all(nested_tbl, names_repair = names_repair, verbose = verbose) R/odata_submission_parse.R:73:4 20. │ ├─base::suppressMessages(...) R/odata_submission_parse.R:60:6 21. │ │ └─base::withCallingHandlers(expr, message = function(c) invokeRestart("muffleMessage")) 22. │ └─tidyr::unnest_wider(nested_tbl, colname, names_repair = names_repair) R/odata_submission_parse.R:60:6 23. │ └─tidyr::unchop(data, !!col, keep_empty = TRUE) 24. │ └─vctrs::vec_rbind(!!!x, .ptype = ptype) 25. ├─vctrs:::vec_cast_dispatch(x = x, to = to, x_arg = x_arg, to_arg = to_arg) 26. ├─vctrs::vec_cast.data.frame(x = x, to = to, x_arg = x_arg, to_arg = to_arg) 27. ├─vctrs:::vec_cast.data.frame.data.frame(...) 28. ├─vctrs:::vec_cast_dispatch(x = x, to = to, x_arg = x_arg, to_arg = to_arg) 29. └─vctrs:::vec_cast.default(x = x, to = to, x_arg = x_arg, to_arg = to_arg) 30. └─vctrs::stop_incompatible_cast(x, to, x_arg = x_arg, to_arg = to_arg) 31. └─vctrs:::stop_incompatible(...) 32. └─vctrs:::stop_vctrs(...) ``` ```{r} > sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 19.10 Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0 locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 LC_MONETARY=en_AU.UTF-8 [6] LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ruODK_0.6.6.9005 tidyr_1.0.0 magrittr_1.5 usethis_1.5.1.9000 loaded via a namespace (and not attached): [1] Rcpp_1.0.3 rstudioapi_0.10 whisker_0.4 knitr_1.26 tidyselect_0.2.5 R6_2.4.1 rlang_0.4.2.9000 [8] httr_1.4.1 dplyr_0.8.3 tools_3.6.1 packrat_0.5.0 xfun_0.11 clipr_0.7.0 htmltools_0.4.0 [15] assertthat_0.2.1 digest_0.6.23 tibble_2.1.3 lifecycle_0.1.0.9000 crayon_1.3.4 processx_3.4.1 callr_3.3.2 [22] purrr_0.3.3.9000 ps_1.3.0 vctrs_0.2.0.9007 fs_1.3.1 testthat_2.3.1 glue_1.3.1 evaluate_0.14 [29] rmarkdown_1.18 reprex_0.3.0 compiler_3.6.1 pillar_1.4.2 pkgconfig_2.0.3 ```
hadley commented 4 years ago

I'm not sure why that code ever worked because y is not a list. If you unnest() x instead, it works as expected:

library(tidyr)
df <- tibble(x = list(), y = integer())
df %>% unnest(x)
#> # A tibble: 0 x 2
#> # … with 2 variables: x <???>, y <int>