scrapinghub / dateparser

python parser for human readable dates
BSD 3-Clause "New" or "Revised" License
2.55k stars 465 forks source link

How to parse when month or day is missing? #424

Open pengyu opened 6 years ago

pengyu commented 6 years ago

The following only says the year is 2017. But dateparser will fill in the month and day as the today. I don't think that this is the most appropriate behavior.

>>> from dateparser import parse
>>> print(parse('2017'))
2017-06-18 00:00:00

Is there a way to just print the most specific information like the following?

kamilnematli commented 5 years ago

It is actually not today's date but the reference date. As default the reference date is the current date. One workaround can be to set the reference date to 1st of January. By this, if the day is missing it will set it to 1st day of month and if day and month missing it will set it to 1st of Jan. But please be careful because there are some other date phrases that are parsed considering reference date, like yesterday.

noviluni commented 4 years ago

Hi @pengyu, the parse method returns a datetime object (not a str) and you can format the result by using strftime.

In your examples:

Check this to know how to use it: https://strftime.org/

As it doesn't exist a datetime object without the day or without the month, it's not technically possible to return less information, so gaps are automatically filled with the reference date (as @kamilnematli explained).

If you want to automatically (programmatically) format the dates according to the parsed content now it's not possible and I don't think it will be possible in the future, as the result of parse will be always a datetime. However, it could be possible to add another function to get information about what has been parsed (like the info we use when setting STRICT_PARSING=True), but this has only sense in certain circumstances, as when parsing "2017 Jan", but it hasn't sense when parsing strings like "yesterday".

noviluni commented 4 years ago

Just added a PR trying to find a solution for this: https://github.com/scrapinghub/dateparser/pull/729

Please, if you have any feedback let me know.