Closed awpsoras closed 5 months ago
Have you seen the new relationship
argument of joins? I think those may be helpful to you. It sounds like you want one-to-one
or one-to-many
https://dplyr.tidyverse.org/reference/mutate-joins.html. See also, multiple
and unmatched
.
I think this is a bit too specific for dplyr, but it sounds useful for an extension package!
I do think the best way to avoid this is to catch it during the join rather than with post hoc analysis of the result (like with the new join args)
Sad, I was hoping it would fit right in with other helper functions like between
. Looking further, I suppose n_distinct
may also work indirectly. You guys have thought of everything!
... select( !where(~ n_distinct(.x) == 1))
I also definitely agree post hoc is always better!
When performing a series of joins and unexpectedly returning more rows than one started with, an
is_equal
function would be a very useful predicate function withinselect( where() )
.If you have an alleged primary key but you end up with >1 rows of a primary key, you can use
is_equal
to find which columns were forcing the duplication of a primary key row.This will help especially in dbplyr cases during sequential joins where the data have too many columns to visually inspect.
I have written a mini blog post showing some potential versions (in R) and application of this function and published it on Rpubs: https://rpubs.com/seadoo/is_equal.
Below is the use of
is_equal
in a small table (copied from the post) showing the desired output.If this is supported as an eligible feature, I'd be happy to work on it and write up documentation!