modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code
http://modin.readthedocs.io
Apache License 2.0
9.91k stars 653 forks source link

BUG: Cannot insert lists into individual cells with `at` or `loc`. Works in pandas. #7406

Open bdalal opened 1 month ago

bdalal commented 1 month ago

Modin version checks

Reproducible Example

import pandas as pd
import modin.pandas as md

pandas_df = pd.DataFrame(list(range(5)), columns=['x'])
modin_df = md.DataFrame(list(range(5)), columns=['x'])

pandas_df.at[1, 'x'] = 'a' # converts to object type
pandas_df.at[1, 'x'] = [1,2,3] # works

modin_df.at[1, 'x'] = 'a' # converts to object type
modin_df.at[1, 'x'] = [1,2,3] # DOESN'T work

Issue Description

This is related to https://github.com/modin-project/modin/issues/4111 which seems to have been incorrectly closed. The bug still exists with the latest version 0.32.

Expected Behavior

Modin should allow assignment like pandas does.

Error Logs

```python-traceback Traceback (most recent call last): File "/root/venv/lib/python3.10/site-packages/modin/pandas/utils.py", line 303, in broadcast_item return np.broadcast_to(item, to_shape), dtypes File "/root/venv/lib/python3.10/site-packages/numpy/lib/stride_tricks.py", line 413, in broadcast_to return _broadcast_to(array, shape, subok=subok, readonly=True) File "/root/venv/lib/python3.10/site-packages/numpy/lib/stride_tricks.py", line 349, in _broadcast_to it = np.nditer( ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (3,) and requested shape (1,1) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/root/venv/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) File "/root/venv/lib/python3.10/site-packages/modin/pandas/indexing.py", line 813, in __setitem__ self._set_item_existing_loc(row_loc, col_loc, item) File "/root/venv/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) File "/root/venv/lib/python3.10/site-packages/modin/pandas/indexing.py", line 881, in _set_item_existing_loc self._setitem_positional( File "/root/venv/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) File "/root/venv/lib/python3.10/site-packages/modin/pandas/indexing.py", line 448, in _setitem_positional new_qc = self.qc.write_items(row_lookup, col_lookup, item) File "/root/venv/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 144, in run_and_log return obj(*args, **kwargs) File "/root/venv/lib/python3.10/site-packages/modin/core/storage_formats/pandas/query_compiler_caster.py", line 157, in cast_args return obj(*args, **kwargs) File "/root/venv/lib/python3.10/site-packages/modin/core/storage_formats/pandas/query_compiler.py", line 4639, in write_items broadcasted_item, broadcasted_dtypes = broadcast_item( File "/root/venv/lib/python3.10/site-packages/modin/pandas/utils.py", line 306, in broadcast_item raise ValueError( ValueError: could not broadcast input array from shape (3,) into shape (1, 1) ```

Installed Versions

INSTALLED VERSIONS ------------------ commit : 3e951a63084a9cbfd5e73f6f36653ee12d2a2bfa python : 3.10.12 python-bits : 64 OS : Linux OS-release : 5.15.0-1019-aws Version : #23~20.04.1-Ubuntu SMP Thu Aug 18 03:20:14 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 Modin dependencies ------------------ modin : 0.32.0 ray : 2.37.0 dask : 2024.9.0 distributed : 2024.9.0 pandas dependencies ------------------- pandas : 2.2.3 numpy : 1.26.4 pytz : 2020.1 dateutil : 2.8.2 pip : 22.3.1 Cython : 0.29.36 sphinx : 2.4.3 IPython : 7.32.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : 2024.5.0 fsspec : 2023.10.0 html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.2 lxml.etree : 4.9.3 matplotlib : 3.8.2 numba : 0.60.0 numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : 2.9.9 pymysql : None pyarrow : 17.0.0 pyreadstat : None pytest : 7.4.3 python-calamine : None pyxlsb : None s3fs : 2023.10.0 scipy : 1.11.4 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None xlsxwriter : None zstandard : 0.22.0 tzdata : 2024.2 qtpy : None pyqt5 : None