moodymudskipper / safejoin

Wrappers around dplyr functions to join safely using various checks
GNU General Public License v3.0
42 stars 7 forks source link

support fuzzy_join ? #21

Closed moodymudskipper closed 5 years ago

moodymudskipper commented 5 years ago

Fuzzy joins can get out of control easily, they would benefit from these checks.

if match_fun is NULL (default), normal join, else apply fuzzy_join.

Support formula notation by the same token.

original code :

https://github.com/dgrtwo/fuzzyjoin/blob/master/R/fuzzy_join.R

Other arguments might not add much value vs confusion ?

Try to understand these examples :

https://stackoverflow.com/search?q=multi_match_fun https://stackoverflow.com/search?q=%5Br%5D+multi_by

moodymudskipper commented 5 years ago

I think we don't need multy_by as we can use a special syntax in by :

c("x","y") := c("a","b","c") or x + y ~ a + b +c (and make it support !!)

Or we just recognize lists and pass them to multi_by

Or we forget fuzzy_join and see if we can leverage data.table instead as it's probably more efficient.

moodymudskipper commented 5 years ago

note that "M" & "N" have to be tested afterwards in the case of a fuzzy join

moodymudskipper commented 5 years ago

done using functions X() and Y()