rapidfuzz / RapidFuzz

Rapid fuzzy string matching in Python using various string metrics
https://rapidfuzz.github.io/RapidFuzz/
MIT License
2.61k stars 116 forks source link

Different results on Windows and Linux? Linux didn't supported? #379

Closed OAE69 closed 4 months ago

OAE69 commented 4 months ago

I run same code on pycharm and Linux, but I get different results, python: from rapidfuzz import fuzz

score= fuzz.token_set_ratio("It is an apple", "It is an apple juice") print(score)

In pycharm, i get 100, In Linux, i get 97, the version of python and rapidfuzz is same

maxbachmann commented 4 months ago

I can't reproduce this on my machine. For me this gives 100 both on Windows and Linux.

So to fix this I would need your help in running some tests on your machine: 1) I assume the result is reproducible for you 2) Can you try:

git clone --recursive https://github.com/rapidfuzz/rapidfuzz.git
cd rapidfuzz
pip install . -v

and then try again. This is simply to validate whether a locally built version shows the same problems.

3) if 2) still shows the problems, I can create a patched version of the library which includes debug prints to get to the bottom of the issue. If it doesn't occur in 2) I will have to think about what we could do.

OAE69 commented 4 months ago

Since my company cannot download package from online, these is the version: thefuzz 0.20.0 rapidfuzz 3.4.0 same version on windows and linux, but still get different results, pycharm encoding is utf-8, linux encoding is en_us.utf-8

maxbachmann commented 4 months ago

Ah that explains your issue. There are two problems for you: 1) you are using the Python fallback version. Probably because you installed the package from source without a C++ compiler present. You can see whats going wrong when increasing the verbosity of the build. The pure Python fallback version works, but is quite a bit slower. 2) There was a bug in the Python fallback implementation of fuzz.token_set_ratio that was fixed in version 3.6.0.

OAE69 commented 4 months ago

Thank you very much! rapidfuzz 3.6.0 fixed this problem.