scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.55k stars 465 forks source link

Q: How to get timedelta from a relative time? #213

Open rmax opened 8 years ago

rmax commented 8 years ago

Hi!

I couldn't find a way to get a timedelta from a string like 3 hours ago rather than a datetime.

The use case is: I have a column when with values like 3 hours ago and a timestamp with a datetime value, so I want to do something like timestamp - dateparser.parse_timedelta("3 hours ago").

kmike commented 8 years ago

It sounds like passing timestamp as a RELATIVE_BASE value can do the trick.

Gallaecio commented 5 years ago

@rmax Since you can use something as simple as timestamp - dateparser.parse_timedelta("3 hours ago"), as you suggest, does it make sense to change the dateparser API for this?

noviluni commented 4 years ago

I needed something similar and I realized that maybe is not a bad idea to add a method to retrieve the timedelta instead of a date object.

As dateparser needs some time to parse a date, when doing datetime.now()-parse('yesterday') we loss precision. I just opened a draft PR (https://github.com/scrapinghub/dateparser/pull/623) to illustrate it.

Examples (using the code in the PR):

In: from dateparser import parser, parse_timedelta

In: str(datetime.now()-parse('13 min ago'))                                                                                                                                                                
Out: '0:12:59.993999'

In: str(parse_timedelta('13 min ago'))                                                                                                                                                                     
Out: '0:13:00.000139'

In: str(datetime.now()-parse('yesterday'))                                                                                                                                                                 
Out: '23:59:59.997510'

In: str(parse_timedelta('yesterday'))                                                                                                                                                                      
Out: '1 day, 0:00:00.000182'

In: str(datetime.now() - parse('1時間13分')) 
Out: '1:12:59.997616'

In: str(parse_timedelta('1時間13分') )                                                                                                                                                                     
Out: '1:13:00.000305'

Of course we could add some corrections (in the order of microseconds) and warn about that in the docs, but the point here is that this library aims to help with date dealing, and this could be really useful for some people.

What do you think?

Bobronium commented 2 years ago

This should do:

from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta

_base = datetime(1, 2, 1)  # any fixed date will do

def _parse_relative_time(text: str) -> datetime | None:
    return dateparser.parse(text, settings={"RELATIVE_BASE": _base, "PARSERS": ["relative-time"]})

def get_timedelta(text: str) -> timedelta | None:
    if parsed_date := _parse_relative_time(text):
        return parsed_date - _base

def get_relativedelta(text: str) -> relativedelta | None:
    if parsed_date := _parse_relative_time(text):
        return relativedelta(parsed_date, _base)
>>> get_timedelta("in 3 days")
datetime.timedelta(days=3)
>>> get_relativedelta("in 3 month")
relativedelta(months=+3)

However it prone to errors, such as:

>>> get_relativedelta("in 90 days")
relativedelta(months=+3)

When in fact, it should return relativedelta(days=+90) because it's not the same as 3 months.

So function like parse_timedelta that would return actual relativedelta from text is really welcome.