pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.68k stars 17.92k forks source link

BUG: TypeError: object of type 'int' has no len() when saving DataFrame with object dtype column #34645

Closed Honzys closed 4 years ago

Honzys commented 4 years ago

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

df = pd.DataFrame({"a": [None, None]})
df.loc[0, "a"] = float(1)
df.loc[1, "a"] = float(2)

hdf = pd.HDFStore("test.h5", write_mode="w")
hdf.put("table", df, format="table")

This causes following error:

  ...
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 1042, in put
    errors=errors,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 1709, in _write_to_group
    data_columns=data_columns,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 4143, in write
    data_columns=data_columns,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 3813, in _create_axes
    errors=self.errors,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 4800, in _maybe_convert_for_string_atom
    for i in range(len(block.shape[0])):
TypeError: object of type 'int' has no len()

Problem description

After initial creation of DataFrame the dtype is of object dtype. After putting float in the a column I would expect that the dtype of the a column will change to float64 dtype, but it remains object dtype. The problem is that the type of df.loc[0, "a"] is float during saving the DataFrame, which causes the problem pasted above.

Expected Output

I would expect one of the following:

There's a pretty big chance that I am wrong and this is expected behaviour. If that's the case, please, can you explain me why, or point me to somewhere, so that I can read something about it?

Maybe it's linked with this issue #34274

Output of pd.show_versions()

commit : None python : 3.6.8.final.0 python-bits : 64 OS : Linux OS-release : 4.18.0-147.5.1.el8_1.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.0.4 numpy : 1.16.4 pytz : 2018.7 dateutil : 2.8.1 pip : 19.3.1 setuptools : 46.4.0 Cython : 0.29.2 pytest : 5.1.2 hypothesis : None sphinx : 1.8.4 blosc : None feather : None xlsxwriter : 1.1.2 lxml.etree : 4.5.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.9.0 pandas_datareader: None bs4 : 4.8.0 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 2.0.0 numexpr : 2.6.8 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 5.1.2 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : 3.6.1 tabulate : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : 1.1.2 numba : None
TomAugspurger commented 4 years ago

I think this is the same as #34274.

mainguyenanhvu commented 2 weeks ago

I face the same issue when assign a pybel object to a cell of pandas. It said:

TypeError:  object of type 'Molecule' has no len()

Although I set dtype = object following this link, it does not work.