intuit / fuzzy-matcher

A Java library to determine probability of objects being similar.
Apache License 2.0
226 stars 69 forks source link

Support for distinct groups of similar strings #43

Closed ajmalrehman closed 3 years ago

ajmalrehman commented 3 years ago

Here is the scenario for input like String[][] input = { {"1", "Nike"}, {"2", "Puma"}, {"3", "Niket dhaka"}, {"4","Levi's"}, {"5","Levi's Fashion for Men and Women"}, {"6","Nike"}, {"7","Puma Sports and Fitness"}, {"8","Puma Shoes"}, {"9","Fashion Nova"}, {"10","H&M Fashion"}, {"11","Nike Sports"}}; the groups formed are something like this: [ { "0": { "id": 11, "name": "Nike Sports" }, "1": { "id": 1, "name": "Nike" }, "2": { "id": 6, "name": "Nike" } }, { "0": { "id": 1, "name": "Nike" }, "1": { "id": 6, "name": "Nike" }, "2": { "id": 11, "name": "Nike Sports" } }, { "0": { "id": 2, "name": "Puma" }, "1": { "id": 8, "name": "Puma Shoes" } }, { "0": { "id": 6, "name": "Nike" }, "1": { "id": 1, "name": "Nike" }, "2": { "id": 11, "name": "Nike Sports" }}, { "0": { "id": 8, "name": "Puma Shoes" }, "1": { "id": 2, "name": "Puma" }},{"0": { "id": 9, "name": "Fashion Nova"}, "1": { "id": 10, "name": "HM Fashion"}},{ "0": { "id": 10, "name": "HM Fashion"}, "1": { "id": 9, "name": "Fashion Nova" }}]

You can see groups with id 11,1,6 and 1,6,11 are formed. is there any way to get only distinct groups?

manishobhatia commented 3 years ago

Hi @ajmalrehman , can you try using MatchService.applyMatchByGroups this should give you a distinct groups of matches.

Let us know if this works for you

Thanks

manishobhatia commented 3 years ago

Closing the issue, feel free to open it, if you think its not resolved or you have further questions