pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.8k stars 17.98k forks source link

BUG: Period.strftime crashes for invalid format strings #53562

Open JozsefKutas opened 1 year ago

JozsefKutas commented 1 year ago

Pandas version checks

Reproducible Example

import pandas as pd
period = pd.Period("2023-Q2")
period.strftime("%Y-Q%Q")  # %Q is invalid, should be %q
# Process finished with exit code -1073740791 (0xC0000409)

Issue Description

Calling Period.strftime with an invalid format string crashes the Python session.

Expected Behavior

Ideally, an exception would be raised instead. For example, with Timestamp:

import pandas as pd
ts = pd.Timestamp("2023-06-08")
ts.strftime("%Y-Q%Q")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas\_libs\tslibs\timestamps.pyx", line 1493, in pandas._libs.tslibs.timestamps.Timestamp.strftime
ValueError: Invalid format string

Installed Versions

INSTALLED VERSIONS ------------------ commit : 965ceca9fd796940050d6fc817707bba1c4f9bff python : 3.9.7.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United Kingdom.1252 pandas : 2.0.2 numpy : 1.24.3 pytz : 2023.3 dateutil : 2.8.2 setuptools : 57.4.0 pip : 21.2.3 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None
tpackard1 commented 1 year ago

Hi all,

When I tried the reproducible example on main branch, I did not get an error for either of the following:

import pandas as pd
period = pd.Period("2023-Q2")
period.strftime("%Y-Q%Q") # '2023-Q%Q'
import pandas as pd
ts = pd.Timestamp("2023-06-08")
ts.strftime("%Y-Q%Q") # '2023-Q%Q'
INSTALLED VERSIONS
------------------
commit           : 3ef98e65d1ab5a7c0eb77f9bbbb1d95ecabfeabb
python           : 3.10.10.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.10.102.1-microsoft-standard-WSL2
Version          : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : C.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 2.1.0.dev0+943.g3ef98e65d1
numpy            : 1.23.5
pytz             : 2022.6
dateutil         : 2.8.2
setuptools       : 65.5.0
pip              : 22.3.1
Cython           : 0.29.35
pytest           : 7.2.0
hypothesis       : 6.61.0
sphinx           : 4.5.0
blosc            : 1.11.0
feather          : None
xlsxwriter       : 3.0.3
lxml.etree       : 4.9.2
html5lib         : 1.1
pymysql          : 1.0.2
psycopg2         : 2.9.5
jinja2           : 3.1.2
IPython          : 8.7.0
pandas_datareader: None
bs4              : 4.11.1
bottleneck       : 1.3.5
brotli           :
fastparquet      : 2022.12.0
fsspec           : 2021.11.0
gcsfs            : 2021.11.0
matplotlib       : 3.6.2
numba            : 0.56.4
numexpr          : 2.8.4
odfpy            : None
openpyxl         : 3.0.10
pandas_gbq       : None
pyarrow          : 9.0.0
pyreadstat       : 1.2.0
pyxlsb           : 1.0.10
s3fs             : 2021.11.0
scipy            : 1.9.3
snappy           :
sqlalchemy       : 1.4.45
tables           : 3.7.0
tabulate         : 0.9.0
xarray           : 2022.12.0
xlrd             : 2.0.1
zstandard        : 0.19.0
tzdata           : 2022.7
qtpy             : None
pyqt5            : None

I did however get the same behavior as @JozsefKutas describes when I tried the reproducible example with my latest version of pandas so I think this is a valid Bug but I might open a separate Issue regarding the behavior on main that I got and I will update this comment when I do.

Aloqeely commented 5 months ago

Still crashes for me on main. cc @MarcoGorelli The crash seems to be happening in these lines (code enters strftime then exits) https://github.com/pandas-dev/pandas/blob/c46fb76afaf98153b9eef97fc9bbe9077229e7cd/pandas/_libs/tslibs/period.pyx#L681-L685

I think result here is empty but the statement of result is NULL weirdly computes to False, I don't really understand C.

MarcoGorelli commented 5 months ago

thanks for the ping! i spent quite some time last year on the reverse operation (strptime), but haven't looked at this code yet