rapidfuzz / RapidFuzz

Rapid fuzzy string matching in Python using various string metrics
https://rapidfuzz.github.io/RapidFuzz/
MIT License
2.61k stars 116 forks source link

Critical issue when passing workers != 1 #403

Closed silenceOfTheLambda closed 1 week ago

silenceOfTheLambda commented 2 weeks ago

When using the latest version 3.9.6, I get a critical error when using workers != 1, i.e. I cannot use multi-processing to speed-up string comparisons; see the below screenshot. The error does not occur in rapidfuzz 3.5.2, however.

iVLRtGPj

What is causing the issue? Apart from downgrading, is there a fix?

(Note: I have also posted this issue on StackOverflow.)

maxbachmann commented 2 weeks ago

Just FYI cdist works on lists of strings. So in your examples this would run a similarity comparison on the individual characters.

On which platform are you testing and with which Python version? I can't reproduce this on my regular machine.

maxbachmann commented 2 weeks ago

In addition you could try the versions in between 3.5.2 and 3.9.6 to see when this issue was introduced

silenceOfTheLambda commented 2 weeks ago

Hi @maxbachmann,

thanks for your quick reply :) I'm aware that cdist runs on lists and iterates over individual characters when single strings are passed. I just had not noticed it when uploading the screenshot. But it is in any case unrelated to the error.

I tested several rapidfuzz versions from 3.5.2 onwards. The bug occurs from 3.9.4 onwards, i.e. only in newer versions of rapidfuzz.

grafik

Side note: The original motivation to upgrade from 3.5.2 to the latest version (3.9.6) was a bug that I noticed in the 3.5.2 version which has apparently been fixed somewhere in between:

https://stackoverflow.com/q/78923595/5269892

Regarding the platform: I'm using python 3.11.6 in an IPython 8.25.0 console inside PyCharm 2024.1.1 Community Edition on a Windows 10 Enterprise laptop. Are there any specific platform characteristics you would like to know?

maxbachmann commented 2 weeks ago

Thanks for adding more details.

I am able to reproduce your report when switching to Windows. From the changes in this release I assume this is triggered by the upgraded cibuildwheel version which might use a newer compiler. So it could be that this was a bug for a much longer time, but is only visible now. I will look into this.

maxbachmann commented 1 week ago

This should be caused by an upgrade of visual studio which made an incompatible change to std::mutex (https://github.com/microsoft/STL/wiki/Changelog#vs-2022-1710). I should be able to make a build with _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR that should solve this issue.

maxbachmann commented 1 week ago

@silenceOfTheLambda can you check whether version 3.9.7 fixes the issue for you?

silenceOfTheLambda commented 1 week ago

@silenceOfTheLambda can you check whether version 3.9.7 fixes the issue for you?

Yes, in 3.9.7 the bug does not occur. Thanks for the quick fix, great work :)