Closed thomasp85 closed 4 years ago
Both vec_cbind()
and vec_rbind()
are sensitive to input names, so you need to unname the list.
Would with_name_repair_errors()
help finding the source of the repair messages?
I understand the reason, but this spills over into dplyr::bind_rows()
where you suddenly get overwhelmed with name repair messages even though you pass in tibbles (that doesn’t have row names). I think this will end up confusing a huge amount of users (it confused both me and @Hadley for a reasonable amount of time).
I think a lot of people creates tibbles in an lapply call and binds them together. lapply might decide to name the elements and this will sometimes trigger a name repair
CRAN dplyr didn't set row names using the input names, and only used them when .id
was set.
I imagine we just need to tweak bind_rows()
to something like
if (is_null(.id)) {
names(dots) <- NULL
} else {
if (!is_string(.id)) {
bad_args(".id", "must be a scalar string, ", "not {friendly_type_of(.id)} of length {length(.id)}")
}
if (!is_named(dots)) {
names(dots) <- seq_along(dots)
}
}
If this is fixed in dplyr instead I’m good with it. I opened here at the request of Hadley
Maybe we could support a sentinel for easily zapping the names, that bind_rows()
could use.
vec_rbind(.names_to = "")
or vec_rbind(.names_to = zap())
Alternatively, maybe this behaviour could occur only if the data frame already had row names?
This is a case where I worry that applying the vector naming rules to row names might lead us down a painful path.
I think this would be confusing, I'd prefer disabling the rows-naming altogether.
Applying this behaviour in an existing function like bind_cols()
can definitely cause problems. In a new function this is less clear. It might actually be positive. Since vec_cbind()
is also sensitive to input names, it is good practice to think about names when splicing a list in one of the bind functions.
Why aren't these the same case?
vctrs::vec_c(a = c(1, 2, 3))
#> Error: Can't merge the outer name `a` with a vector of length > 1.
#> Please supply a `.name_spec` specification.
vctrs::vec_rbind(a = data.frame(1:3))
#> X1.3
#> a1 1
#> a2 2
#> a3 3
Created on 2020-03-30 by the reprex package (v0.3.0)
It's a good idea to make these consistent. This wouldn't help with Thomas' example with 1-row data frames though.
For @thomasp85's original problem, just improving the name repair would help:
library(vctrs)
vec_rbind(
`1` = data.frame(x = 1:11),
`11` = data.frame(x = 1)
)
#> New names:
#> * `11` -> `11...1`
#> * `11` -> `11...12`
#> x
#> 11...1 1
#> 12 2
#> 13 3
#> 14 4
#> 15 5
#> 16 6
#> 17 7
#> 18 8
#> 19 9
#> 110 10
#> 111 11
#> 11...12 1
Created on 2020-03-30 by the reprex package (v0.3.0)
But I really do think it should be harder to opt into row names
Making the sentinel zap()
the default of .names_to
and adding a .name_spec
parameter would solve the issue.
When passing in a named list of data frames without row names to
vec_rbind()
the return value is a data frame with row names. This change can potentially trigger a hard-to-debug renaming message, e.g.