Closed johntmyers closed 4 years ago
Hi @johntmyers !
What about using the PARSERS
setting?
Example:
>>> search_dates("hello 4000", settings={'PARSERS': ['timestamp']})
>>> search_dates("hello 1592498315", settings={'PARSERS': ['timestamp']})
[('1592498315', datetime.datetime(2020, 6, 18, 18, 38, 35))]
Let me know if this works for you. :slightly_smiling_face:
Yes but unfortunately I lose other matching. I am testing with ["timestamp", "absolute-time", "base-formats"]
but it appears absolute-time
is the one that matches on numbers, but also gives me parsing for other things I'd still want too.
Hi @johntmyers!
I spent some time checking this and this is currently a bug.
This:
>>> dateparser.parse("4000", settings={'PARSERS': ['absolute-time'], 'STRICT_PARSING': True})
datetime.datetime(1900, 1, 1, 4, 0)
shouldn't return anything, but it does.
The reason is that the absolute-time
parser tries different things but if they don't work, it tries to parse it as a date without spaces. This is not checking the STRICT_PARSING
setting, so it returns a solution coming from this format: '%H%M%S'
.
I have just opened a new draft PR (https://github.com/scrapinghub/dateparser/pull/715) addressing this and I will release it within the next version. One of our goals for the upcoming version is to check that all the settings
are working properly, as it seems that there are some edge cases where they are not applied.
I'm sorry, but I don't know any workaround for you to fix it temporarily.
Thank you for your feedback.
Glad you found it! Thanks for addressing it so quickly!
Hi @johntmyers, as we divided the old absolute-time
parser in the absolute-time
and no-spaces-time
parsers and we deactivated the second one by default, you will be able to parse this correctly in the next version by using the STRICT_PARSING=True
setting.
Example:
>>> search_dates('hallo 4000')
[('4000', datetime.datetime(4000, 9, 21, 0, 0))]
>>> search_dates('hallo 4000', settings={'STRICT_PARSING': True})
Currently just about any number will be parsed, is there a way to ignore this? Strict setting seems to have no affect.
Example: