Closed bash83 closed 3 years ago
Hi, With this library you can apply a weight to an Element but not a Token as such. If the order of names is guaranteed , you can split the names into multiple element and give wight to the middle initial.
In practice though many times the data might not be as clean and we can see issues like First name appear at the end in the full name, and other such combinations. This library that into account and does a reasonable match, and you should still see a good score no matter
The scoring works in this fashion consider names First Last
and First Middle1 Last
Here there are 3 unique tokens (First, Last and Middle1) out of these 2 are similar (First and Last) so you will get a score of 2/3 or 0.67
If the order of the names is not the same, you will still see the same result
In cases where First Middle1 Last
is matched with First Middle1 Last
you will see a score of 1.0 since all of them match perfectly.
I would suggest you run this library against your dataset and let us know if you see any issue. I am not aware of any open issues that will impact Name matches
Hope this help
Manish
Closing the issue, feel free to open it, if you think its not resolved or you have further questions
Quite new here..
I'm looking into using fuzzy-matcher in a sanction list monitoring service (will share on GitHub once ready) but I'm facing an issue with figuring out the best way to address full names
Example: Sanction lists provides a name such as "first last" but matching against "first middle1 middle2 last".
What is the best approach to match the latter to the first? In other words, ignore "middle1 middle2" from the query? I can programatically tokenize the search query and run it for various combinations. Furthermore, in case of a middle initial (I saw an issue earlier that was not resolved), can a weight be given to the initial?
For example "First M Last" should get a higher score compared to "First Last".
Any feedback would be greatly appreciated.