pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.92k stars 18.03k forks source link

BUG: ValueError when executing a DataFrame with another DatFrame in its attrs #60455

Open kaba439 opened 1 day ago

kaba439 commented 1 day ago

Pandas version checks

Reproducible Example

import pandas as pd
import numpy as np

n=50

a = pd.DataFrame(np.random.randint(0, 10, size=(n,n)))
b = pd.DataFrame(np.random.randint(0, 10, size=(5,5)))

a.attrs['b'] = b

a

Issue Description

Dear pandas Team,

Adding a DataFrame b to the attrs of another Dataframe a raises ValueError if dimensions of a are >20 Possibly related to #51280, #60357, #60351

Error log:

--------------------------------------------------------------------------- --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/frame.py in ?(self) 1210 self.info(buf=buf) 1211 return buf.getvalue() 1212 1213 repr_params = fmt.get_dataframe_repr_params() -> 1214 return self.to_string(**repr_params) ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/util/_decorators.py in ?(*args, **kwargs) 329 msg.format(arguments=_format_argument_list(allow_args)), 330 FutureWarning, 331 stacklevel=find_stack_level(), 332 ) --> 333 return func(*args, **kwargs) ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/frame.py in ?(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, max_cols, show_dimensions, decimal, line_width, min_rows, max_colwidth, encoding) 1372 """ 1373 from pandas import option_context 1374 1375 with option_context("display.max_colwidth", max_colwidth): -> 1376 formatter = fmt.DataFrameFormatter( 1377 self, 1378 columns=columns, 1379 col_space=col_space, ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/io/formats/format.py in ?(self, frame, columns, col_space, header, index, na_rep, formatters, justify, float_format, sparsify, index_names, max_rows, min_rows, max_cols, show_dimensions, decimal, bold_rows, escape) 465 self.max_cols_fitted = self._calc_max_cols_fitted() 466 self.max_rows_fitted = self._calc_max_rows_fitted() 467 468 self.tr_frame = self.frame --> 469 self.truncate() 470 self.adj = printing.get_adjustment() ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/io/formats/format.py in ?(self) 651 """ 652 Check whether the frame should be truncated. If so, slice the frame up. 653 """ 654 if self.is_truncated_horizontally: --> 655 self._truncate_horizontally() 656 657 if self.is_truncated_vertically: 658 self._truncate_vertically() ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/io/formats/format.py in ?(self) 669 col_num = self.max_cols_fitted // 2 670 if col_num >= 1: 671 left = self.tr_frame.iloc[:, :col_num] 672 right = self.tr_frame.iloc[:, -col_num:] --> 673 self.tr_frame = concat((left, right), axis=1) 674 675 # truncate formatter 676 if isinstance(self.formatters, (list, tuple)): ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/reshape/concat.py in ?(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy) 391 copy=copy, 392 sort=sort, 393 ) 394 --> 395 return op.get_result() ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/reshape/concat.py in ?(self) 687 if not self.copy and not using_copy_on_write(): 688 new_data._consolidate_inplace() 689 690 out = sample._constructor_from_mgr(new_data, axes=new_data.axes) --> 691 return out.__finalize__(self, method="concat") ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/generic.py in ?(self, other, method, **kwargs) 6269 # propagate attrs only if all concat arguments have the same attrs 6270 if all(bool(obj.attrs) for obj in other.objs): 6271 # all concatenate arguments have non-empty attrs 6272 attrs = other.objs[0].attrs -> 6273 have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:]) 6274 if have_same_attrs: 6275 self.attrs = deepcopy(attrs) 6276 ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/generic.py in ?(.0) -> 6273 have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:]) ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/generic.py in ?(self) 1575 @final 1576 def __nonzero__(self) -> NoReturn: -> 1577 raise ValueError( 1578 f"The truth value of a {type(self).__name__} is ambiguous. " 1579 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." 1580 ) ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/frame.py in ?(self) 1232 min_rows = get_option("display.min_rows") 1233 max_cols = get_option("display.max_columns") 1234 show_dimensions = get_option("display.show_dimensions") 1235 -> 1236 formatter = fmt.DataFrameFormatter( 1237 self, 1238 columns=None, 1239 col_space=None, ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/io/formats/format.py in ?(self, frame, columns, col_space, header, index, na_rep, formatters, justify, float_format, sparsify, index_names, max_rows, min_rows, max_cols, show_dimensions, decimal, bold_rows, escape) 465 self.max_cols_fitted = self._calc_max_cols_fitted() 466 self.max_rows_fitted = self._calc_max_rows_fitted() 467 468 self.tr_frame = self.frame --> 469 self.truncate() 470 self.adj = printing.get_adjustment() ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/io/formats/format.py in ?(self) 651 """ 652 Check whether the frame should be truncated. If so, slice the frame up. 653 """ 654 if self.is_truncated_horizontally: --> 655 self._truncate_horizontally() 656 657 if self.is_truncated_vertically: 658 self._truncate_vertically() ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/io/formats/format.py in ?(self) 669 col_num = self.max_cols_fitted // 2 670 if col_num >= 1: 671 left = self.tr_frame.iloc[:, :col_num] 672 right = self.tr_frame.iloc[:, -col_num:] --> 673 self.tr_frame = concat((left, right), axis=1) 674 675 # truncate formatter 676 if isinstance(self.formatters, (list, tuple)): ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/reshape/concat.py in ?(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy) 391 copy=copy, 392 sort=sort, 393 ) 394 --> 395 return op.get_result() ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/reshape/concat.py in ?(self) 687 if not self.copy and not using_copy_on_write(): 688 new_data._consolidate_inplace() 689 690 out = sample._constructor_from_mgr(new_data, axes=new_data.axes) --> 691 return out.__finalize__(self, method="concat") ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/generic.py in ?(self, other, method, **kwargs) 6269 # propagate attrs only if all concat arguments have the same attrs 6270 if all(bool(obj.attrs) for obj in other.objs): 6271 # all concatenate arguments have non-empty attrs 6272 attrs = other.objs[0].attrs -> 6273 have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:]) 6274 if have_same_attrs: 6275 self.attrs = deepcopy(attrs) 6276 ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/generic.py in ?(.0) -> 6273 have_same_attrs = all(obj.attrs == attrs for obj in other.objs[1:]) ~/py/envs/jupyter/lib/python3.13/site-packages/pandas/core/generic.py in ?(self) 1575 @final 1576 def __nonzero__(self) -> NoReturn: -> 1577 raise ValueError( 1578 f"The truth value of a {type(self).__name__} is ambiguous. " 1579 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." 1580 ) ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Expected Behavior

well, no ValueError :-)

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.13.0 python-bits : 64 OS : Linux OS-release : 6.10.13-3-MANJARO Version : #1 SMP PREEMPT_DYNAMIC Tue Oct 8 03:24:49 UTC 2024 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8 pandas : 2.2.3 numpy : 2.1.3 pytz : 2024.1 dateutil : 2.9.0.post0 pip : 24.3.1 Cython : None sphinx : None IPython : 8.30.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.4 lxml.etree : None matplotlib : 3.9.2 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.14.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : 0.23.0 tzdata : 2024.2 qtpy : None pyqt5 : None
yuanx749 commented 23 hours ago

Formatting in _truncate_horizontally uses concat. I think we can use numpy instead as in _truncate_vertically to avoid this error.