Results differ from python library

daniel17903 commented 5 years ago

Hi, while porting some python code to java I discovered that the Token Sort and Token Set Ratios calculated by this library oftentimes do not match the ones calculated by the python fuzzywuzzy library.

Here is an example: Python Code:

from fuzzywuzzy import fuzz 
print(str(fuzz.token_sort_ratio("efwe fwef","wef wefwef"))) 
print(str(fuzz.token_set_ratio("efwe fwef","wef wefwef")))

Output:

53
53

Java Code:

import me.xdrop.fuzzywuzzy.FuzzySearch;

public class Main {
    public static void main(String[] args) {
        System.out.println(FuzzySearch.tokenSortRatio("efwe fwef","wef wefwef"));
        System.out.println(FuzzySearch.tokenSetRatio("efwe fwef","wef wefwef"));
    }
}

Output:

84
84

Where is this difference coming from? Shouldn't these two outputs be equal?

xdrop commented 5 years ago

We only ported the python-levenshtein module and not the built-in python difflib (for speed). Are you using the Python library with the python-levenshtein module installed?

ie. instead of

pip install fuzzywuzzy

use

pip install fuzzywuzzy[speedup]

daniel17903 commented 5 years ago

Thanks. When installing fuzzywuzzy[speedup] the results match. I wasn't aware that using different libraries impacts the output.

xdrop / fuzzywuzzy

Results differ from python library #74