`select` is not consistent with `dplyr::select` when used on data.frame with duplicate column names

TimTeaFan commented 3 years ago

I was playing around with data.frames with duplicate column names and stumbled upon this inconsistency with {dplyr}:

library(dplyr)
dat <- data.frame(a = 1, b = 2, a = 3, check.names = FALSE) 

dat %>% poorman::select(a)
#>   a
#> 1 1

dat %>% dplyr::select(a)
#> Error: Names must be unique.
#> x These names are duplicated:
#>   * "a" at locations 1 and 2.

^{Created on 2021-05-24 by the reprex package (v0.3.0)}

The question is: is {poorman} supposed be 100% consistent with {dplyr}?

If yes then poorman::select should throw an error as well.

On the other hand, {poorman} - unlike {dplyr} - might not be bound in the same way to the concept of tidy data, and it would be nice to have a go-to package when dealing with untidy data.frame's. In this case both a columns should be selected.

Regarding mutate the behavior differs as well:

dat %>% poorman::mutate(c = 4)
#>   a b a.1 c
#> 1 1 2   3 4

dat %>% dplyr::mutate(c = 4)
#> Error: Can't transform a data frame with duplicate names.

It seems like mutate automatically uses check.names = TRUE and renames the duplicate column name without notice. In this case an error might be preferable (or as an alternative, the column names could be left untouched).

^{Created on 2021-05-24 by the reprex package (v0.3.0)}

I didn't consider this to be a "bug", so I opened a blank issue.

nathaneastwood commented 3 years ago

Hi @TimTeaFan, thanks for submitting this issue - it's an interesting one. I would say that given {dplyr} fails in these instances, {poorman} should also fail. My initial curiosity lies in wondering where this fails within {dplyr}. Is it an issue from {dplyr} itself, {tibble} or maybe {tidyselect}? Once I know that, I will be better placed to understand where {poorman} should capture and handle this type of issue. I will do some digging and get back to you!

TimTeaFan commented 3 years ago

Regarding dplyr::select the issue is caused by tidyselect::eval_select. I digged into this a little in this SO answer. Regarding dplyr::mutate I'm not sure if this is caused by {tidyselect}.

nathaneastwood / poorman

`select` is not consistent with `dplyr::select` when used on data.frame with duplicate column names #92