tidyverse / tidyr

Tidy Messy Data
https://tidyr.tidyverse.org/
Other
1.38k stars 417 forks source link

unnest() can create duplicate column names #265

Closed krlmlr closed 6 years ago

krlmlr commented 7 years ago

if unnesting more than one column and the containing data frames have duplicate columns.

library(dplyr)
library(tidyr)
data <- data_frame(a = 1:3, b = 1:3) %>% nest(-a)
data
## # A tibble: 3 × 2
##       a             data
##   <int>           <list>
## 1     1 <tibble [1 × 1]>
## 2     2 <tibble [1 × 1]>
## 3     3 <tibble [1 × 1]>
inner_join(data, data, by = "a") %>% unnest
## # A tibble: 3 × 3
##       a     b     b
##   <int> <int> <int>
## 1     1     1     1
## 2     2     2     2
## 3     3     3     3

This is not a problem if unnest() is called with a .sep argument, I wonder if uniqueness should be enforced though.

holgerbrandl commented 7 years ago

I think unnest should fail if the resulting tbl is not well formed (i.e. has duplicated column names).

wpetry commented 7 years ago

In v.0.6.3 @krlmlr 's reprex silently appends a number(s) to the duplicated column name(s).

# A tibble: 3 x 3
      a     b    b1
  <int> <int> <int>
1     1     1     1
2     2     2     2
3     3     3     3

Numbers make sense here (unnest() could be used many times in series), and it's a more graceful solution than failing outright. Perhaps the behaviour would be more intuitive if it followed the "dot" convention of dplyr joins for duplicated names (cf. the reprex below) rather than appending a number alone?

library(dplyr)
library(tidyr)
data <- data_frame(a = 1:3, b = 1:3) %>% nest(-a)
full_join(data, data, by = "a")
# A tibble: 3 x 3
      a   b.x   b.y
  <int> <int> <int>
1     1     1     1
2     2     2     2
3     3     3     3
hadley commented 6 years ago

This is a result of bind_cols() semantics (which now always disambiguates), so I think the behaviour should only change in dplyr.