Closed b2m closed 4 years ago
The ISRI ARabic Stemmer (src.whoosh.lang.isry.py) does not work on Python >= 3.6.
src.whoosh.lang.isry.py
Exception: re.error: bad escape \u at position 0. Reason: changed behavior of re.sub.
re.error: bad escape \u at position 0
Changed in version 3.6: Unknown escapes in pattern consisting of '\' and an ASCII letter now are errors.
(quoted from https://docs.python.org/3/library/re.html)
Code snippet to reproduce:
from whoosh.analysis import LanguageAnalyzer analyzer = LanguageAnalyzer(lang='ar') [(token.text, token.stopped) for token in analyzer("This is a test")]
Codesamples with bad escape sequences:
Thanks Benjamin, good catch. I have submitted a PR to fix this.
Should be fixed by https://github.com/whoosh-community/whoosh/pull/557
The ISRI ARabic Stemmer (
src.whoosh.lang.isry.py
) does not work on Python >= 3.6.Exception:
re.error: bad escape \u at position 0
. Reason: changed behavior of re.sub.(quoted from https://docs.python.org/3/library/re.html)
Code snippet to reproduce:
Codesamples with bad escape sequences: