xdrop / fuzzywuzzy

Java fuzzy string matching implementation of the well known Python's fuzzywuzzy algorithm. Fuzzy search for Java
GNU General Public License v2.0
822 stars 118 forks source link

How to set the scorer like the python fuzzywuzzy? #93

Closed Zaky7 closed 3 years ago

Zaky7 commented 3 years ago

In the python fuzzy-wuzzy, we can set the scorer we want to use in extracting the result. How we can do it here?

process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
    ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

Do we have any gitter, discord in order to ask such questions?

burdoto commented 3 years ago

FuzzyWuzzy features the Ratios as standalone classes, and a set of classes for algorithms.

I think https://github.com/xdrop/fuzzywuzzy/blob/master/src/me/xdrop/fuzzywuzzy/algorithms/TokenSort.java is the one that you are looking for.

Zaky7 commented 3 years ago

@burdoto I am in a situation where the fuzzy match is not giving me acceptable results

 val query = "Berry"
    val searchKeys = listOf("B", "ARRY", "AGEN", "Abercrombie & Fitch Company", "BlackBerry Limited")
    val result = FuzzySearch.extractTop(query, searchKeys, 5)
    result.forEach { extractedResult: ExtractedResult ->
        println(extractedResult)
    }

Output

(string: B, score: 90, index: 0)
(string: BlackBerry Limited, score: 90, index: 4)
(string: Abercrombie & Fitch Company, score: 72, index: 3)
(string: ARRY, score: 67, index: 1)
(string: AGEN, score: 22, index: 2)

I don't know why B has a score of 90. I need to dig deeper, If you have suggestions regarding this, they are welcome

Currently I am thing of tweaking the score to get results

burdoto commented 3 years ago

Honestly, B scoring as much as BlackBerry for the string Berry seems very reasonable to me.

Each Ratio and Algorithm also implements Applicable, so you should try using the Extractor class directly and passing an instance of the TokenSort algorithm, like this:

var extractor = new Extractor();
Object bestMatch = extractor.extractOne("Berry", values, new TokenSort());
Zaky7 commented 3 years ago

Thanks, man :)

Zaky7 commented 3 years ago

@burdoto can you comment on how much this library is similar to Python Fuzzywuzzy. I have a lot of scorer on fuzzy wuzzy side.

burdoto commented 3 years ago

As far as I know, this is an as-perfect-as-it-gets source port of the library. It even features bugs that are inside the original! 🥳 I might be wrong on this though.