pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.78k stars 17.97k forks source link

BUG: Incorrect parsing of ISO 8601 durations #36204

Closed lmeyerov closed 4 years ago

lmeyerov commented 4 years ago

Code Sample, a copy-pastable example

pd.Timedelta('P1Y')
pd.Timedelta('P1M')
pd.Timedelta('P1W')
pd.Timedelta('P1D')
pd.Timedelta('P1DT1H')
pd.Timedelta('PT1H')
pd.Timedelta('P1H')

Problem description

Trying most iso8601 duration formats fails.

For above examples:

Encountered while trying to figure out neo4j.time.Duration().isoformat() -> pandas ->arrow > rapids cudf timedelta[*] . Neo4j returns values like P1Y3M15DT2H3M and P1Y3M.

jreback commented 4 years ago

Y M are not valid Timedeltas (as they are not fixed intervals)

they the other might be a bug if they are in fact ISO 8601

pull requests welcome

lmeyerov commented 4 years ago

This sounds like pandas docs should not say they support ISO durations

Pandas date time guide:

image

vs wikipedia summary of iso8601 durations:

image

lmeyerov commented 4 years ago

Linking:

https://github.com/pandas-dev/pandas/issues/29773

https://github.com/pandas-dev/pandas/pull/15136