BUG: TypeError: object of type 'int' has no len() when saving DataFrame with object dtype column

Honzys commented 4 years ago

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

df = pd.DataFrame({"a": [None, None]})
df.loc[0, "a"] = float(1)
df.loc[1, "a"] = float(2)

hdf = pd.HDFStore("test.h5", write_mode="w")
hdf.put("table", df, format="table")

This causes following error:

  ...
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 1042, in put
    errors=errors,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 1709, in _write_to_group
    data_columns=data_columns,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 4143, in write
    data_columns=data_columns,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 3813, in _create_axes
    errors=self.errors,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 4800, in _maybe_convert_for_string_atom
    for i in range(len(block.shape[0])):
TypeError: object of type 'int' has no len()

Problem description

After initial creation of DataFrame the dtype is of object dtype. After putting float in the a column I would expect that the dtype of the a column will change to float64 dtype, but it remains object dtype. The problem is that the type of df.loc[0, "a"] is float during saving the DataFrame, which causes the problem pasted above.

Expected Output

I would expect one of the following:

Implicit conversion of the column to float dtype
Conversion during hdf.put()
Proper exception saying that I am saving mixed typed column

There's a pretty big chance that I am wrong and this is expected behaviour. If that's the case, please, can you explain me why, or point me to somewhere, so that I can read something about it?

Maybe it's linked with this issue #34274

Output of `pd.show_versions()`

commit : None python : 3.6.8.final.0 python-bits : 64 OS : Linux OS-release : 4.18.0-147.5.1.el8_1.x86_64 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.0.4 numpy : 1.16.4 pytz : 2018.7 dateutil : 2.8.1 pip : 19.3.1 setuptools : 46.4.0 Cython : 0.29.2 pytest : 5.1.2 hypothesis : None sphinx : 1.8.4 blosc : None feather : None xlsxwriter : 1.1.2 lxml.etree : 4.5.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.9.0 pandas_datareader: None bs4 : 4.8.0 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 2.0.0 numexpr : 2.6.8 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : 5.1.2 pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : 3.6.1 tabulate : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : 1.1.2 numba : None

TomAugspurger commented 4 years ago

I think this is the same as #34274.

mainguyenanhvu commented 2 weeks ago

I face the same issue when assign a pybel object to a cell of pandas. It said:

TypeError:  object of type 'Molecule' has no len()

Although I set dtype = object following this link, it does not work.

pandas-dev / pandas