pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.57k stars 17.9k forks source link

BUG: Insert then delete column into MultiIndex with timestamps leads to RecursionError #56853

Open traubms opened 9 months ago

traubms commented 9 months ago

Pandas version checks

Reproducible Example

import pandas as pd

# multiindex with the second level being a Timestamp
df = pd.DataFrame({('A', pd.Timestamp('2024-01-01')): [0]})

# insert using only the top level
df.insert(1, 'B', [1])

print(df.to_string())
#            A   B
#   2023-01-01 NaT
# 0          0   1

# raises RecursionError
del df['B']

Issue Description

Creating and deleting a column is leading to an unexpected error.

This is a contrived example, but was observed in wild when joining two dataframes which had MultiIndex columns with str and timestamp levels with a named index, say 'Index'. The join ends up adding a column ('Index', pd.NaT) then deleting it to set it as the index.

Expected Behavior

It should just delete the column.

del df['B']
print(df.to_string())
#            A 
#   2023-01-01
# 0          0 

Installed Versions

INSTALLED VERSIONS ------------------ commit : a671b5a8bf5dd13fb19f0e88edc679bc9e15c673 python : 3.11.7.final.0 python-bits : 64 OS : Darwin OS-release : 22.5.0 Version : Darwin Kernel Version 22.5.0: Mon Apr 24 20:51:50 PDT 2023; root:xnu-8796.121.2~5/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.1.4 numpy : 1.26.3 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 68.2.2 pip : 23.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.20.0 pandas_datareader : None bs4 : None bottleneck : 1.3.5 dataframe-api-compat: None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : 2.8.7 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None
rhshadrach commented 9 months ago

Thanks for the report - confirmed on main. Further investigations and PRs to fix are welcome!