pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.79k stars 17.98k forks source link

BUG: Handle negative sign (-) when parsing ISO 8601 durations #37172

Closed mgmarino closed 3 years ago

mgmarino commented 4 years ago

Related to #37159, #29773, #36204, splitting out only dealing with the behavior of the negative sign when parsing ISO 8601 Durations.

The current behavior is somewhat counter intuitive:

"P-6DT0H50M3.010010012S" parses as Timedelta( days=-6, minutes=50, seconds=3, milliseconds=10, microseconds=10, nanoseconds=12, ) , and the negative is only allowed right after the P descriptor. A negative in any other position will raise an error.

This comment notes that the original spec for 8601 doesn't mention negativity at all, but that some other "extensions" (e.g. usage of it in Java Duration) do support it. I have been unable to find the detailed ISO 8601 spec.

As far as I can tell, there are a few possibilities to deal with this here:

_Originally posted by @mgmarino in https://github.com/pandas-dev/pandas/pull/37159#discussion_r506726762_

mgmarino commented 4 years ago

The link to the relevant documentation in the Java Duration class.

They note in the docs there that negatives are not part of the ISO 8601 standard.

My suspicion, however, is that many users need to parse "ISO8601-like" strings that include these extensions. This is indeed my case as well. As such, I would propose supporting the negative as Java Duration does it, e.g.:

"PT-6H3M" -- parses as "-6 hours and +3 minutes" "-PT6H3M" -- parses as "-6 hours and -3 minutes" "-PT-6H+3M" -- parses as "+6 hours and -3 minutes"

avinashpancham commented 3 years ago

take

cnygardtw commented 1 year ago

while investigating this, I found a link to a thread on the postgresql mailing list discussing the same issues, which references an extension to 8601: https://www.postgresql.org/message-id/9q0ftb37dv7.fsf%40gmx.us