intuit / fuzzy-matcher

A Java library to determine probability of objects being similar.
Apache License 2.0
226 stars 69 forks source link

Full name matching #46

Closed bash83 closed 3 years ago

bash83 commented 3 years ago

Quite new here..

I'm looking into using fuzzy-matcher in a sanction list monitoring service (will share on GitHub once ready) but I'm facing an issue with figuring out the best way to address full names

Example: Sanction lists provides a name such as "first last" but matching against "first middle1 middle2 last".

What is the best approach to match the latter to the first? In other words, ignore "middle1 middle2" from the query? I can programatically tokenize the search query and run it for various combinations. Furthermore, in case of a middle initial (I saw an issue earlier that was not resolved), can a weight be given to the initial?

For example "First M Last" should get a higher score compared to "First Last".

Any feedback would be greatly appreciated.

manishobhatia commented 3 years ago

Hi, With this library you can apply a weight to an Element but not a Token as such. If the order of names is guaranteed , you can split the names into multiple element and give wight to the middle initial.

In practice though many times the data might not be as clean and we can see issues like First name appear at the end in the full name, and other such combinations. This library that into account and does a reasonable match, and you should still see a good score no matter

The scoring works in this fashion consider names First Last and First Middle1 Last Here there are 3 unique tokens (First, Last and Middle1) out of these 2 are similar (First and Last) so you will get a score of 2/3 or 0.67 If the order of the names is not the same, you will still see the same result

In cases where First Middle1 Last is matched with First Middle1 Last you will see a score of 1.0 since all of them match perfectly.

I would suggest you run this library against your dataset and let us know if you see any issue. I am not aware of any open issues that will impact Name matches

Hope this help

Manish

manishobhatia commented 3 years ago

Closing the issue, feel free to open it, if you think its not resolved or you have further questions