moodymudskipper / safejoin

Wrappers around dplyr functions to join safely using various checks
GNU General Public License v3.0
42 stars 7 forks source link

join between ? #22

Closed moodymudskipper closed 5 years ago

moodymudskipper commented 5 years ago

It's a special type of fuzzy join :

fuzzy_left_join(a, b, by = c(x = "start", x = "end"), match_fun = list(`>`, `<`))

but that might be the most common one, so a special shortcut syntax would be nice, actually it feels like a natural fuzzy join syntax.

safe_left_join(a, b, ~.x$x > .y$start & .x$x < .y$end)

The issue is that we don't want to make a cartesian product with the whole data so we have select the appropriate columns only, which would be easy by parsing the formula but would break in 1000s of ways.

OR we create our own syntax for this, using (fake?) functions X and Y, and then :

safe_left_join(a, b, ~ X("x") > Y("start") & X("x") <  Y("end"))

Then it's easy and explicit to pass it to fuzzyjoin using the multi_by and multi_match_fun feature.

We can keep match_fun to have compatibility with regular fuzzyjoin syntax. so the only thing to do is to first parse this formula and pass all the args to the regular fuzzy_join function

moodymudskipper commented 5 years ago

We have to choose if we'd do it the NSE way or not, but i think the way above is better because regular by uses SE and this is not pretty :

safe_left_join(a, b, ~ X(!!ensym(some_var)) > Y("start") & X(!!ensym(some_var)) <  Y("end"))
moodymudskipper commented 5 years ago

Done! well no optimized "between" but the syntax is simple at least