Open aweiser opened 2 years ago
Hi @aweiser,
think you're right. Only lowercase content is processed as haystack in the following regexps. Needles are case-sensitive and disabled with flags in search_kwargs, when cases should be ignored. So when case sensitve, uppercase words will not be found.
In fuzzy matching algorithm, additional lowercase-conversions are done for needle and haystack. So it looks it should work when removing lowercase-converting as you suggest, but I'm not a Python-pro yet.
Describe the bug I wanted to perform automatic tag assignments on a case-sensitive match, in that particular case for the word "Rechnung". I wanted explicitly to use case-sensitive matching in order to avoid conflicts with similar words like "Abrechnung", which would be matched if caseinsensitive is selected.
However, I get no matches when I run document_retagger.
After a brief analysis I found that IMHO some lines in matching.py cause this bug, see below. Not sure if this is done on purpose, could you pls. check?
===== def matches(matching_model, document): search_kwargs = {}