Closed leungi closed 5 years ago
Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session.
Adding a minimal reprex based on my understanding of OP issue. I think this is also related to #316.
I conceptually think of unnest
as something which results in more rows/columns than the tibble
provided while nest
results in fewer rows/columns. Perhaps this is why these issues have been raised as losing rows during an unnest
might be counter intuitive for some users (myself included) even though unnest
is working as documented.
I think the desired result for both this issue and #316 is a dplyr::left_join
of the non-list columns being unnested combined with the unnest
results as shown in the workaround below.
library(dplyr)
library(tidyr)
library(purrr)
df <- tibble(x = 1:2, y = list(tibble(), tibble(a = 5, b = 7)))
# Row with empty tibble has been removed
df %>% unnest()
#> # A tibble: 1 x 3
#> x a b
#> <int> <dbl> <dbl>
#> 1 2 5 7
# Would like to keep all rows instead. Possible workaround:
df1 <- df %>% select(-y)
df2 <- df %>% filter(length(y) > 0) %>% unnest()
left_join(df1, df2, by = "x")
#> # A tibble: 2 x 3
#> x a b
#> <int> <dbl> <dbl>
#> 1 1 NA NA
#> 2 2 5 7
Perhaps an extra example in the documentation to highlight this feature of unnest
could help to make users more aware of this situation? (I'd be happy to draft a PR if that was the case)...
Hadley/Mark, thanks for reviewing this; apologies for delayed reply as I got tied up with work.
The original data in question came from an API call, and I didn't save it, but it's similar to what Mark has. He's also on point regarding my issue.
Mark's solution yields the intended result as my workaround:
Based on Mark's comments, this issue/phenomena is to by design, though I believe it'll be useful to have an argument in unnest to keep non-empty list after unnesting. I find these situations happening quite often in my work.
Hmmmmm, maybe it's worth having an option for this, but I'm not sure what to call it.
Thanks Hadley.
Suggestion: na.drop = T/F
How about empty = "drop" or "fill"
(e.g. similar approach to the extra
and fill
option values in separate
)
replace_na()
now works with list-cols so you can at least do this:
library(tidyr)
library(tibble)
df <- tibble(x = 1:2, y = list(tibble(), tibble(a = 5, b = 7)))
df %>%
replace_na(list(y = list(tibble(a = NA, b = NA)))) %>%
unnest()
#> # A tibble: 2 x 3
#> x a b
#> <int> <dbl> <dbl>
#> 1 1 NA NA
#> 2 2 5 7
Hadley,
I'm using tibble_1.3.4 and tidyr_0.7.2, but can't reproduce your output, unless the upgraded replace_na is not in latest CRAN versions yet.
library(tidyr)
library(tibble)
df <- tibble(x = 1:2, y = list(tibble(), tibble(a = 5, b = 7)))
df %>%
replace_na(list(y = list(tibble(a = NA, b = NA)))) %>%
unnest()
#> # A tibble: 1 x 3
#> x a b
#> <int> <dbl> <dbl>
#> 1 2 5 7
It's in the dev version, sorry.
I've now hit this use case in two practical problems, so I definitely believe it should be an option.
Happy 2018; thanks for update.
Look forward to your enhancements!
Thank you for the temporary workaround (and upcoming feature)! Spent more time than I'd like to admit tonight trying to figure out this exact use case...
Note: this is related to a left join vs an inner join.
Thanks for update and linking issue @hadley; will try it out when nest_join() turns on in dev version.
I think this might be best as drop = FALSE
and can be implemented internally with something like:
explicit_na <- function(x) {
dims <- length(dims(x))
if (dims == 0L && length(x) == 0) {
x[NA_integer]
} else if (dims == 2L && nrow(x) == 0) {
x[NA_integer, , drop = FALSE]
} else {
x
}
}
I couldn't get explicit_na
to work as it is, but if I tweak it slightly:
library(dplyr)
explicit_na <- function(x) {
dims <- length(dim(x))
if (dims == 0L && length(x) == 0) {
x <- ifelse(is.list(x) && !is.data.frame(x), list(NA_integer_), NA_integer_)
} else if (dims == 2L && nrow(x) == 0) {
x[TRUE, ] <- NA_integer_
}
x
}
These cases return what I'd expect
character(0) %>% explicit_na()
#> [1] NA
list() %>% explicit_na()
#> [[1]]
#> [1] NA
data.frame(a = character()) %>% explicit_na()
#> a
#> 1 <NA>
But now I'm wondering what should happen if a dataframe has no names?
df <- data.frame()
df
#> data frame with 0 columns and 0 rows
df %>% explicit_na()
#> data frame with 0 columns and 1 row
That function is just a reminder for me. It needs testing.
Note to self: can't use .drop
because it's already used to control if the variables being unnested are dropped.
Currently implemented in unnest2()
, which I'm going to re-unify with unnest()
shortly.