Open JBGruber opened 4 years ago
columns <- map(enquos(..., .named = TRUE), rlang::eval_tidy)
We need to add a way of collecting dots to rlang with auto-naming of elements. However I'd capture the names separately here. If you're not data-masking it's better to collect with list2()
because eval_tidy()
always evaluates in a child environment, even if no mask is supplied (that's for technical reasons). But here you can also use tibble()
.
Also there is vec_duplicate_detect()
in vctrs:
detect_distinct <- function(...) {
!vctrs::vec_duplicate_detect(tibble(...))
}
We are experimenting with detect_
as the verb for vectorised predicates. The idea is that is_
functions should only return a non-missing single boolean, so they are always safe to use within if ()
conditions.
oops vec_duplicate_detect()
is not completely right here:
df %>% mutate(unique_case = is_distinct2(col1, col2))
#> col1 col2 unique_case
#> 1 1 1 TRUE
#> 2 2 1 TRUE
#> 3 3 2 TRUE
#> 4 1 3 TRUE
#> 5 4 4 FALSE
#> 6 4 4 FALSE
A distinct function should be consistent with dplyr::n_distinct()
:
n_distinct(df)
#> [1] 5
This API will be reviewed in the next vctrs version (for instance vec_duplicate_detect()
will be renamed to vec_detect_duplicate()
) and we'll probably have vec_detect_unique()
which could be used here.
I wonder if we should drop the "distinct" terminology and consistently use "unique". n_distinct()
would become count_unique()
.
As mentioned in a dplyr issue I think that an
is_distinct()
function would be a worthwhile addition to the tidyverse.I used code from the current implementations of
distinct.data.frame()
andn_distinct()
to write an example function of what I mean (from #44 I understand that you want to reimpliment more efficient versions of both here, otherwise I would have put this in a PR directly):The difference compared to just using
distinct()
is that the user has control over what should happen with cases that aren't distinct. For example, one might want to analyse cases which are not distinct and see if they have something in common/are different from other cases. Thanks.