Closed bholtemeyer closed 1 week ago
If the reason to want this is is so that one can check prior to using mutate(..., .by = ...)
to get the effect of rowwise
then perhaps it would be better to support something like .by = .ROWID
.
A one-liner that calculates isid
would be:
isid <- function(data, ...) ! anyDuplicated(data[c(...)])
isid(anscombe) # TRUE
isid(anscombe, "x1", "x2") # TRUE
isid(anscombe, c("x1", "x2")) # TRUE
isid(anscombe, "x4") # FALSE
anscombe
## x1 x2 x3 x4 y1 y2 y3 y4
## 1 10 10 10 8 8.04 9.14 7.46 6.58
## 2 8 8 8 8 6.95 8.14 6.77 5.76
## 3 13 13 13 8 7.58 8.74 12.74 7.71
## 4 9 9 9 8 8.81 8.77 7.11 8.84
## 5 11 11 11 8 8.33 9.26 7.81 8.47
## 6 14 14 14 8 9.96 8.10 8.84 7.04
## 7 6 6 6 8 7.24 6.13 6.08 5.25
## 8 4 4 4 19 4.26 3.10 5.39 12.50
## 9 12 12 12 8 10.84 9.13 8.15 5.56
## 10 7 7 7 8 4.82 7.26 6.42 7.91
## 11 5 5 5 8 5.68 4.74 5.73 6.89
I think there are many ways to use existing tools for this, so I think it is a little too niche to make a helper in dplyr for this
library(dplyr)
library(vctrs)
uniquely <- function(...) {
args <- rlang::list2(...)
names(args) <- paste0("..", seq_along(args))
args <- vctrs::new_data_frame(args)
!vctrs::vec_duplicate_any(args)
}
anscombe |>
summarise(
res = !vec_duplicate_any(pick(x1, x2)),
res2 = uniquely(x1, x2),
res3 = n_distinct(x1, x2) == nrow(anscombe)
)
#> res res2 res3
#> 1 TRUE TRUE TRUE
anscombe |>
summarise(
res = !vec_duplicate_any(pick(x4)),
res2 = uniquely(x4),
res3 = n_distinct(x4) == nrow(anscombe)
)
#> res res2 res3
#> 1 FALSE FALSE FALSE
See https://github.com/tidyverse/dplyr/issues/6660 for .by = row
ideas
I'd like to have a function to check if a set of variables form a unique ID in a dataframe, like this: https://search.r-project.org/CRAN/refmans/eeptools/html/isid.html
I think this would make code more readable as pipes would not need to be involved.
function would return TRUE or FALSE. TRUE indicates the variables uniquely identify the rows. FALSE indicates they do not.