scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.55k stars 465 forks source link

rounding up and down #919

Open salva opened 3 years ago

salva commented 3 years ago

Hi, I am working on a program where the user can select some files using newer-than and older-than predicates as for instance:

foo --older-than=yesterday --newer-than="5 days ago"

The issue I have is that dataparser uses the current time when that part is missing. So, for instance, that --older-than=yesterday above, becomes something like 2021-05-02 14:42:36.990992, but what I need is 2021-05-02 00:00:00.

On the other hand, I would like --newer-than=yesterday to become 2021-05-02 23:59:59.99999. And in a similar fashion, I would like --newer-than=2020 to become 2020-12-31 23:59:59.99999.

The idea is very similar to that of PREFER_DAY_OF_MONTH but for all the parts, not just months.

noviluni commented 3 years ago

Hi @salva

There's currently a proposal for adding PREFER_TIME_OF_DAY to the settings. If you are interested and or have ideas you can write a comment in that issue (https://github.com/scrapinghub/dateparser/issues/802).

On the other hand, what you want to achieve can be easily performed by using the replace() method of the datetime object:

older_than = dateparser.parse(args.older_than)
if older than:
    older_than = older_than.replace(hour=0, minute=0, second=0, microsecond=0)

Let me know if this fixes your issue :slightly_smiling_face:

salva commented 3 years ago

On the other hand, what you want to achieve can be easily performed by using the replace() method of the datetime object: ... Let me know if this fixes your issue

No, not really, because I want the date time rounded only when that part of the date is not explicitly given.

For instance, rounding to the past:

yesterday       ==> 2021-05-02 00:00:00
yesterday 13:30 ==> 2021-05-02 13:30:00  # Time is given, so it is not rounded
January         ==> 2021-01-01 00:00:00
2021            ==> 2021-01-01 00:00:00

Rounding to the future:

yesterday       ==> 2021-05-02 03:59:59.99999
yesterday 13:30 ==> 2021-05-02 13:30:59.99999  # Seconds are not given, so they are rounded up
January         ==> 2021-01-31 24:59:59.99999
2021            ==> 2021-12-31 24:59:59.99999  # the last month, day, hour, minute and second is picked

In the end, what I am asking for is a new setting equivalent to PREFER_MONTH_OF_YEAR+PREFER_DAY_OF_MONTH+PREFER_HOUR_OF_DAY+PREFER_MINUTE_OF_HOUR+PREFER_SECOND_OF_MINUTE (supposing the PREFER_DAY_OF_MONTH concept is extended to all those date components).

noviluni commented 3 years ago

Hi @salva

Ok, I see.

In that case, you can get the desired result for the first query using the DateDataParser.get_date_data(), which is similar to use the dateparser.parse() method but it provides more information:

>>> ddp = DateDataParser(settings={'RETURN_TIME_AS_PERIOD': True})

>>> ddp.get_date_data('yesterday 13:30')
DateData(date_obj=datetime.datetime(2021, 5, 2, 13, 30), period='time', locale='en')

>>> ddp.get_date_data('yesterday')
DateData(date_obj=datetime.datetime(2021, 5, 2, 16, 17, 14, 313943), period='day', locale='en')

So your code would be something like:

ddp = DateDataParser(settings={'RETURN_TIME_AS_PERIOD': True})
older_than_data = ddp.get_date_data(args.older_than)

if older_than_data:
    older_than = older_than_data.date_obj

    if older_than_data.period != 'time':
        # fix time when is not especified
        older_than = older_than.replace(hour=0, minute=0, second=0, microsecond=0)

You can use the same way to get the period and perform the operations you need, but you can only know the last period (example: ddp.get_date_data('January') will have 'month' as period, so you won't know if the year was missing or not).

You can also check the RELATIVE_BASE setting to see if it works for you.

If all of these solutions don't work for you we could try to give a change to your proposal and see how it could be implemented :)