jceresearch / pydit

Library of data wrangling functions that an internal auditor typically needs (for my own use and learning, if you wish to use or collaborate pls get in touch, or use at your own peril).
https://pypi.org/project/pydit-jceresearch/
MIT License
2 stars 0 forks source link

Fuzzy merge using one or more columns in tandem plus a hardcode #57

Open jceresearch opened 1 year ago

jceresearch commented 1 year ago

Come up with a "magic merge" feature that would

A) apply hardcode first

B) attempt first hard merge with col1 then col2 etc C) attempt a fuzzy match with each with strict tolerance D) define who wins

Options to pre lower() Options to pre cleanup non a-z 0-9

jceresearch commented 1 year ago

To think what fuzzy matching we want ... probably we don't want to go too far as we are typically joining keys emails or usernames or some short title (system name) not addresses but we can add some options

jceresearch commented 1 year ago

Add some metrics on success of each, maybe some indicator column stating the origin