pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.32k stars 17.81k forks source link

BUG: Parsing ISO 8601 duration time components with value above 99 throws an error #48122

Open eLVas opened 2 years ago

eLVas commented 2 years ago

Pandas version checks

Reproducible Example

import pandas as pd

pd.Timedelta("PT99M")   # This is parsed correctly
pd.Timedelta("PT100M")  # This throws an error

pd.Timedelta("PT99H")   # This is parsed correctly
pd.Timedelta("PT100H")  # This throws an error

Issue Description

When parsing ISO 8601 with time components that are represented by more than 2 digits, I get an error:

ValueError: Invalid ISO 8601 Duration format - PT100M
``` pd.Timedelta("PT100M") # This throws an error File "pandas/_libs/tslibs/timedeltas.pyx", line 1342, in pandas._libs.tslibs.timedeltas.Timedelta.__new__ File "pandas/_libs/tslibs/timedeltas.pyx", line 755, in pandas._libs.tslibs.timedeltas.parse_iso_format_string ValueError: Invalid ISO 8601 Duration format - PT100M ```

I have checked the information in open sources about the duration ISO 8601 format and couldn't find anything that would prohibit using more than two digits to represent duration. Unfortunately, I do not have access to the full text of ISO 8601 standard.

I have also checked that other ISO 8601 parser implementations don't have the same issue. (For example, isodate is able to parse this without issues.)

Expected Behavior

Durations in ISO 8601 format represented by more than 2 digits are parsed correctly.

Installed Versions

INSTALLED VERSIONS ------------------ commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.10.6.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-124-generic Version : #140-Ubuntu SMP Thu Aug 4 02:23:37 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.4.3 numpy : 1.23.2 pytz : 2022.2.1 dateutil : 2.8.2 setuptools : 60.2.0 pip : 21.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None markupsafe : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None Process finished with exit code 1
phofl commented 2 years ago

Hi, thanks for your report. It looks like the error was added deliberately. cc @WillAyd I think you implemented this initially?

WillAyd commented 2 years ago

At least according to Wikipedia it doesn't seem like there is an absolute standard on whether this is required or not. If it's something you'd like to provide a patch to support I think would take it