intuit / fuzzy-matcher

A Java library to determine probability of objects being similar.
Apache License 2.0
228 stars 69 forks source link

Trying to match names with initials #22

Closed amathur2k closed 4 years ago

amathur2k commented 4 years ago

Hi, I am trying to match names such a A Mathur with ABhishek Mathur or Donald Trump with D Trump. Is there some simple parameter i can adjust to allow that ?

Thanks Abhi

manishobhatia commented 4 years ago

Hi Abhi, In these examples since 1 out of 2 names matches , you should get a 50% match.

So for example if these names are part of a List of name

List<String> sourceString = Arrays.asList("A Mathur", "ABhishek Mathur", "Donald Trump", "D Trump");

We just need to feed the library with a Document with an Element of Name.

AtomicInteger idCount = new AtomicInteger();

List<Document> sourceDoc = sourceString.stream().map(name -> {
            return new Document.Builder(idCount.incrementAndGet() + "")
                    .addElement(new Element.Builder().setType(NAME).setValue(name).createElement())
                    .setThreshold(0.4)
                    .createDocument();
        }).collect(Collectors.toList());

Map<String, List<Match<Document>>> result = matchService.applyMatchByDocIdOld(sourceDoc);

Note, that each document needs a Key , you can feed your own unique key for these. Also we would need to reduce the Document threshold a little, since by default it considers a matching document greater than 0.5

You should be able to see the match results , using this same print to console

result.entrySet().forEach(entry -> {
            entry.getValue().forEach(match -> {
                System.out.println("Data: " + match.getData() + " Matched With: " + match.getMatchedWith() + " Score: " + match.getScore().getResult());
            });
        });

Result

Data: {[{'A Mathur'}]} Matched With: {[{'ABhishek Mathur'}]} Score: 0.5
Data: {[{'ABhishek Mathur'}]} Matched With: {[{'A Mathur'}]} Score: 0.5
Data: {[{'Donald Trump'}]} Matched With: {[{'D Trump'}]} Score: 0.5
Data: {[{'D Trump'}]} Matched With: {[{'Donald Trump'}]} Score: 0.5
amathur2k commented 4 years ago

Thanks Manish for the detailed response, however i fear reducing the threshold to under 0. will start matching Miachel to Mitchell, and D Trump to J Trump. I am looking at the rosette api's t see how they are doing this. though they dont have code open sourced.