scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.55k stars 465 forks source link

Proposal: Have an api to retrieve the relativedelta information #1035

Open oliviercwa opened 2 years ago

oliviercwa commented 2 years ago

dateparser is a fantastic tool to calculate a date given a literal expression like "3 years and 6 months".

However, it does not seem possible to leverage the syntactic parser to retrieve a relativedelta(https://dateutil.readthedocs.io/en/stable/relativedelta.html) structure filled with the result of the parsing. For example if the text says:

3 years and 6 months

It would be great to obtain a structured filled with

relativedelta(year=3, month=6)

At least I could not find on the API a way to do so. The only way seems to calculate the resulting date with

dateparser.get_date_data("3 years 6 months")

then get the current date (datetime.now()) and derive the number of months. The drawback is you loose the resolution. In the example above, you will be able to calculate 42 months but won't be able to keep the original sentence meaning: 3 years and 6 months.

PROrock commented 2 years ago

I support this request too. And even better workaround is to use RELATIVE_BASE settings, because then you use exactly the same datetime for both comparisons. And it returns the exact time delta (down to milliseconds). Example code is below:

from datetime import datetime

from dateparser.date import DateDataParser
from dateutil.relativedelta import relativedelta

BASE = datetime(2000, 1, 1)  # or: BASE = datetime.now()
DPP = DateDataParser(languages=['en'], settings={"RELATIVE_BASE": BASE})

def get_delta(text: str) -> relativedelta:
    parsed_datetime = DPP.get_date_data(text).date_obj
    # or (less-efficient for larger number of calls):
    # parsed_datetime = dp.parse(text, settings={"RELATIVE_BASE": BASE})
    return relativedelta(BASE, parsed_datetime)

print(get_delta("3 years and 6 months"))
# outputs: relativedelta(years=+3, months=+6)
PROrock commented 2 years ago

Also, I just found there is already an open issue for this with even better code for this - https://github.com/scrapinghub/dateparser/issues/213

mooskagh commented 1 year ago

The snippet doesn't seem to work completely correct:

In [17]: get_delta("-5wk")
Out[17]: relativedelta(months=+1, days=+5)   # I'd expect negative months and days.

In [18]: get_delta("5wk")
Out[18]: relativedelta(months=+1, days=+5) 

In [20]: get_delta("1 may 1990")
Out[20]: relativedelta(years=+9, months=+8)   # I understand why that returns result, but I'd prefer error if I expect duration.