pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.62k stars 17.57k forks source link

BUG: #59048

Closed evyasonov closed 1 week ago

evyasonov commented 2 weeks ago

Pandas version checks

Reproducible Example

import pandas, numpy

df = pandas.DataFrame([['s', 1, 2.3, 0]], columns=['str', 'int', 'flt', 'to ignore'])
df['int'] = df['int'].astype('Int64')
df['flt'] = df['flt'].astype('Float64')

columns_to_process = ['str', 'int', 'flt']

df.loc[:, columns_to_process] = df.loc[:, columns_to_process].astype(str)

Issue Description

df.loc[:, columns_to_process] = df.loc[:, columns_to_process].astype(str)

throws the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: len() of unsized object
Exception ignored in: 'pandas._libs.lib.is_string_array'
Traceback (most recent call last):
  File "C:\Users\evgeniy\AppData\Local\anaconda3\Lib\site-packages\pandas\core\arrays\numeric.py", line 162, in _coerce_to_data_and_mask
    inferred_type = lib.infer_dtype(values, skipna=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: len() of unsized object
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: len() of unsized object
Exception ignored in: 'pandas._libs.lib.is_string_array'
Traceback (most recent call last):
  File "C:\Users\evgeniy\AppData\Local\anaconda3\Lib\site-packages\pandas\core\arrays\numeric.py", line 162, in _coerce_to_data_and_mask
    inferred_type = lib.infer_dtype(values, skipna=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: len() of unsized object

Expected Behavior

No error

Installed Versions

INSTALLED VERSIONS ------------------ commit : a671b5a8bf5dd13fb19f0e88edc679bc9e15c673 python : 3.11.7.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : Russian_Russia.1251 pandas : 2.1.4 numpy : 1.26.4 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 68.2.2 pip : 23.3.1 Cython : None pytest : 7.4.0 hypothesis : None sphinx : 5.0.2 blosc : None feather : None xlsxwriter : 3.2.0 lxml.etree : 4.9.3 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.20.0 pandas_datareader : None bs4 : 4.12.2 bottleneck : 1.3.7 dataframe-api-compat: None fastparquet : None fsspec : 2023.10.0 gcsfs : None matplotlib : 3.8.0 numba : 0.59.0 numexpr : 2.8.7 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 14.0.2 pyreadstat : None pyxlsb : None s3fs : 2023.10.0 scipy : 1.11.4 sqlalchemy : 2.0.25 tables : 3.9.2 tabulate : 0.9.0 xarray : 2023.6.0 xlrd : None zstandard : 0.19.0 tzdata : 2023.3 qtpy : 2.4.1 pyqt5 : None
Aloqeely commented 2 weeks ago

Thanks for the report! This is fixed on the latest release of pandas (v2.2), could you update pandas and see if that resolves the problem for you?

Aloqeely commented 1 week ago

Going to close - feel free to re-open if the issue isn't fixed after updating.