pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.71k stars 17.92k forks source link

BUG: ValueError: Incompatible indexer with DataFrame #47229

Open Ynjxsjmh opened 2 years ago

Ynjxsjmh commented 2 years ago

Pandas version checks

Reproducible Example

Original issue occurs in Incompatible indexer with DataFrame - Stack Overflow, following is a reproducible example

import numpy as np
import pandas as pd

d = {'HomePlanet': {0: 1.0, 1: 0.0, 2: 1.0, 3: 0.0, 4: np.nan},
     'RoomService': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
     'test': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan}}

df = pd.DataFrame(d)
print(df)

   HomePlanet  RoomService  test
0         1.0          NaN   NaN
1         0.0          NaN   NaN
2         1.0          NaN   NaN
3         0.0          NaN   NaN
4         NaN          NaN   NaN

Issue Description

Here OP wants to fill the NaN in RoomService column based on condition, so he tries

df.loc[df.HomePlanet == 1, 'RoomService'] = df.fillna(135)

which throws error

Traceback (most recent call last):
  File "/home/winy/sourcecode/test/so/72499714.py", line 26, in <module>
    df.loc[df.HomePlanet == 1, 'RoomService'] = df.fillna(135)
  File "/home/winy/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 716, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/home/winy/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1690, in _setitem_with_indexer
    self._setitem_single_block(indexer, value, name)
  File "/home/winy/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1932, in _setitem_single_block
    value = self._align_frame(indexer, value)
  File "/home/winy/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 2200, in _align_frame
    raise ValueError("Incompatible indexer with DataFrame")
ValueError: Incompatible indexer with DataFrame

Expected Behavior

Expected output is below

   HomePlanet  RoomService  test
0         1.0        135.0   NaN
1         0.0          NaN   NaN
2         1.0        135.0   NaN
3         0.0          NaN   NaN
4         NaN          NaN   NaN

The ValueError wouldn't appear if there is no NaN in HomePlanet column

import numpy as np
import pandas as pd

d = {'HomePlanet': {0: 1, 1: 0, 2: 1, 3: 0},
     'RoomService': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan},
     'test': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan}}

df = pd.DataFrame(d)

# df['HomePlanet'] = df['HomePlanet'].astype('float') # Adding this wouldn't cause error
df.loc[df.HomePlanet == 1, 'RoomService'] = df.fillna(135)
print(pd.DataFrame(d))

   HomePlanet  RoomService  test
0           1          NaN   NaN
1           0          NaN   NaN
2           1          NaN   NaN
3           0          NaN   NaN

print(df)

   HomePlanet  RoomService  test
0           1        135.0   NaN
1           0          NaN   NaN
2           1        135.0   NaN
3           0          NaN   NaN

With further experiment, I find that the error even could be reproduced by removing the NaN in HomePlanet column

import numpy as np
import pandas as pd

d = {'HomePlanet': {0: 1.0, 1: 0.0, 2: 1.0, 3: 0.0, },
     'RoomService': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, },
     'test': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, }}

df = pd.DataFrame(d)

#df['HomePlanet'] = df['HomePlanet'].astype(int) # Adding this wouldn't cause error
df.loc[df.HomePlanet == 1, 'RoomService'] = df.fillna(135)
print(pd.DataFrame(d))

   HomePlanet  RoomService  test
0         1.0          NaN   NaN
1         0.0          NaN   NaN
2         1.0          NaN   NaN
3         0.0          NaN   NaN
Traceback (most recent call last):
  File "/home/winy/sourcecode/test/so/72499714.py", line 23, in <module>
    df.loc[df.HomePlanet == 1, 'RoomService'] = df.fillna(135)
  File "/home/winy/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 716, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/home/winy/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1690, in _setitem_with_indexer
    self._setitem_single_block(indexer, value, name)
  File "/home/winy/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1932, in _setitem_single_block
    value = self._align_frame(indexer, value)
  File "/home/winy/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 2200, in _align_frame
    raise ValueError("Incompatible indexer with DataFrame")
ValueError: Incompatible indexer with DataFrame

Installed Versions

`pd.show_versions()` throws error on both my local `Arch Linux 5.17.5-arch1-1` machine and remote `Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-48-generic x86_64)` machine. Pandas version: - Local machine: 1.4.2 - Remote machine: 1.3.5 ``` Traceback (most recent call last): File "/home/winy/sourcecode/test/so/72499714.py", line 35, in print(pd.show_versions()) File "/home/winy/.local/lib/python3.10/site-packages/pandas/util/_print_versions.py", line 109, in show_versions deps = _get_dependency_info() File "/home/winy/.local/lib/python3.10/site-packages/pandas/util/_print_versions.py", line 88, in _get_dependency_info mod = import_optional_dependency(modname, errors="ignore") File "/home/winy/.local/lib/python3.10/site-packages/pandas/compat/_optional.py", line 138, in import_optional_dependency module = importlib.import_module(name) File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/home/winy/.local/lib/python3.10/site-packages/setuptools/__init__.py", line 8, in import _distutils_hack.override # noqa: F401 File "/home/winy/.local/lib/python3.10/site-packages/_distutils_hack/override.py", line 1, in __import__('_distutils_hack').do_override() File "/home/winy/.local/lib/python3.10/site-packages/_distutils_hack/__init__.py", line 72, in do_override ensure_local_distutils() File "/home/winy/.local/lib/python3.10/site-packages/_distutils_hack/__init__.py", line 59, in ensure_local_distutils assert '_distutils' in core.__file__, core.__file__ AssertionError: /usr/lib/python3.10/distutils/core.py ```
dhruvsamdani commented 2 years ago

take

simonjayhawkins commented 2 years ago

Thanks @Ynjxsjmh for the report and investigation.

This does appear buggy.

as a workaround, changing the LHS to be a DataFrame seems to align with the RHS without issue.

df.loc[df.HomePlanet == 1, ["RoomService"]] = df.fillna(135)

contributions and PRs welcome.