Closed Etirf closed 1 year ago
I am seeing the exact same behavior with code that worked just 2 hours ago. This is on macOS. I tested with python 3.8.2, 3.8.5, and 3.10.2
Same here. Python 3.7.12, macOS.
Same here, Python 3.9-slim and 3.10-slim docker images, sample code:
from dateparser import parse
parse("7 days ago")
Output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/site-packages/dateparser/conf.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/dateparser/__init__.py", line 61, in parse
data = parser.get_date_data(date_string, date_formats)
File "/usr/local/lib/python3.10/site-packages/dateparser/date.py", line 428, in get_date_data
parsed_date = _DateLocaleParser.parse(
File "/usr/local/lib/python3.10/site-packages/dateparser/date.py", line 178, in parse
return instance._parse()
File "/usr/local/lib/python3.10/site-packages/dateparser/date.py", line 182, in _parse
date_data = self._parsers[parser_name]()
File "/usr/local/lib/python3.10/site-packages/dateparser/date.py", line 196, in _try_freshness_parser
return freshness_date_parser.get_date_data(self._get_translated_date(), self._settings)
File "/usr/local/lib/python3.10/site-packages/dateparser/date.py", line 234, in _get_translated_date
self._translated_date = self.locale.translate(
File "/usr/local/lib/python3.10/site-packages/dateparser/languages/locale.py", line 131, in translate
relative_translations = self._get_relative_translations(settings=settings)
File "/usr/local/lib/python3.10/site-packages/dateparser/languages/locale.py", line 158, in _get_relative_translations
self._generate_relative_translations(normalize=True))
File "/usr/local/lib/python3.10/site-packages/dateparser/languages/locale.py", line 172, in _generate_relative_translations
pattern = DIGIT_GROUP_PATTERN.sub(r'?P<n>\d+', pattern)
File "/usr/local/lib/python3.10/site-packages/regex/regex.py", line 700, in _compile_replacement_helper
is_group, items = _compile_replacement(source, pattern, is_unicode)
File "/usr/local/lib/python3.10/site-packages/regex/_regex_core.py", line 1736, in _compile_replacement
raise error("bad escape \\%s" % ch, source.string, source.pos)
regex._regex_core.error: bad escape \d at position 7
We were using dateparser==1.0.0
, upgrading to dateparser==1.1.0
didn't solve the issue.
dependency regex==2022.3.15
made this probably
rolling back to regex==2022.1.18
may help
update: this commit https://github.com/mrabarnett/mrab-regex/commit/138970bafb3d6fbe0987632ee149c04e8b5acf95
I can confirm that deploying regex==2022.1.18 instead (through conda in my case) makes the bug disappear.
Caused by behaviour change introduced in mrabarnett/mrab-regex@138970bafb3d6fbe0987632ee149c04e8b5acf95 (released as regex
v2022.3.15), installing any version before this (eg v2022.3.2) should fix
Change was to now raise on invalid ASCII escape characters in pattern compiling and substitution. Not sure if it's a bug with dateparser
or regex
This will be a problem on all supported platforms and environments (Linux, MacOS, Windows; Python 3.6 to 3.10)
Making CI/CD break when installing latest version. Please update the PyPI package too, thanks a lot.
Hi. I was also faced with the same problem (and thought it was a Mac M1 problem with the regex
lib).
It turns out to be related to the drop of Python 3.6 support in regex
:
Since Python 3.6, the re module has been rejecting unknown escape sequences such as
\q
in patterns and escape sequences including\d
in replacement templates.As the regex module no longer supports versions of Python <3.6, I've brought the regex module into line with re.
You code should now read:
pattern = DIGIT_GROUP_PATTERN.sub(r'?P<n>\\d+', pattern)
More info in mrabarnett/mrab-regex/issues/459
Here is a problematic pattern but there may be more?
I can confirm that this issue is NOT specific to MacOS - our CI/CD uses Linux machines and was affected by this. My local machine, running Ubuntu, was also affected.
Explicitly pinning regex==2022.1.18
as suggested by @xiaopc fixed it for us.
Thanks for the fix and for writing the library in the first place. This seems to me to be one of the best date parsing libraries, we use it for a lot of data imports. Hoping for a soon pip release as well. Keep up the good work :+1:
Many thanks for thorough investigation!
For now I'll make a quick fix by pinning regex
version, but in the long run we should follow @tducret's suggestion (https://github.com/scrapinghub/dateparser/issues/1045#issuecomment-1069484011) and reform the regexes.
If anyone's up for a PR with the fix, please go ahead!
Is it possible to push the version 1.1.1 to pypi please?
Thank you for raising that, @rerowep. It seems like the PyPI publish action got stuck. It's published now :+1:
Currently the issue is quick-fixed by pinning regex to an older version, which is not applicable in certain environments, e.g., with modules installed via RPMs.
Wouldn't something like this fix the issue:
--- a/dateparser/languages/locale.py
+++ b/dateparser/languages/locale.py
@@ -169,7 +169,7 @@ class Locale:
if normalize:
value = list(map(normalize_unicode, value))
pattern = '|'.join(sorted(value, key=len, reverse=True))
- pattern = DIGIT_GROUP_PATTERN.sub(r'?P<n>\d+', pattern)
+ pattern = pattern.replace(r'\d+', r'?P<n>\d+')
pattern = re.compile(r'^(?:{})$'.format(pattern), re.UNICODE | re.IGNORECASE)
relative_dictionary[pattern] = key
return relative_dictionary
Based on this comment. Note that I'm not sure this is correct or complete, but judging on a a run of the testsuite together with regex-2022.3.15, it seems to work (besides some imho unrelated things, which are also broken with regex-2022.3.2).
Reopening until we fix it properly.
Independently arrived on the same solution as the PR, explanation for the bug here
Fine. Expecting now a new publish on pypi !
Hello everyone,
Tried parsing under python 3.7.5 and 3.9
dateparser.parse('12/12/12')
It also gives the same output for any "valid" input shown in the doc:
Here's the error:
How to reproduce: Env: windows 10