gweis / isodate

ISO 8601 date/time parser
BSD 3-Clause "New" or "Revised" License
148 stars 58 forks source link

BCE years trigger ISO8601Error #79

Open jonasengelmann opened 1 year ago

jonasengelmann commented 1 year ago

I work with historical data and also BCE dates, however these currently result in an error, e.g.: Unrecognised ISO 8601 date format: '-0663-01-01'

Yet according to ISO 8601 BCE years should be denoted with a preceding - sign, e.g. -0002-04-12 is even listed as an example.

dymil commented 1 year ago

I'm also affected but note this is known behaviour: https://github.com/gweis/isodate/blob/8856fdf0e46c7bca00229faa1aae6b7e8ad6e76c/src/isodate/isodates.py#L6 Probably the fix is to replace the built-in Python library with another implementation, e.g., NumPy, but that is also probably a breaking change.

micahcochran commented 1 year ago

@dymil hit the nail on the head with the response. I wrote this so I would understand the problem a little better. Python's datetime.datetime doesn't support negative years.

>>> from datetime import datetime

>>> datetime(63,1,1)
datetime.datetime(63, 1, 1, 0, 0)

>>> datetime(-63,1,1)
ValueError: year -63 is out of range

Python date libraries like Pendulum (pendulum.datetime) and Arrow (arrow.Arrow) have the same problem because these libraries use datetime.datetime for their internal representation.

The expanded parameter of parse_date's documentation suggests that negative years are supported source code. The regular expressions support negative years, if only a datetime.datetime function (or one that took similar parameters) would support negative years.

I agree with @dymil NumPy is a great alternative that does the ISO parsing itself and supports negative years.

If there were another library that took a datetime.datetime-like parameters and supported negative years, then I think it would be a good move to add a datetime_func= parameter to the parse_date() function. This approach would add no new imports for isodate. Is there a Python date library that supports negative/BCE dates?

jonasengelmann commented 1 year ago

Thanks for your clarifications! I have worked with flexidate before, maybe it is worth a try. In any case, I think the current error message when parsing BCE years is misleading, perhaps it could be updated in the meantime?

micahcochran commented 1 year ago

In any case, I think the current error message when parsing BCE years is misleading, perhaps it could be updated in the meantime?

expanded= parameter

@jonasengelmann If you want to parse negative years per the current documentation, it should be called like this.

>>> isodate.parse_date('-0663-01-01', expanded=True)
ValueError: year -663 is out of range

I think a ValueError is a descriptive error message.

If you want, I can put a PR together to allow a FlexiDate object to be able to passed instead of the Python date function. Only if you want me to.

Here's what I'd propose:

>>> import flexidate
>>> isodate.parse_date('-0663-01-01', expanded=True, date_obj=flexidate.Flexidate)
<class 'flexidate.FlexiDate'> -0663-01-01

This has the benefit of adding no more dependencies to the isodate library, but still allowing it to be useful for this edge case.