pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.37k stars 17.83k forks source link

BUG: Reassigning an index in a Series with a 1 dimensional numpy array of length 1 loses dimension. Regression from v 2.0.0 #53565

Open lendle opened 1 year ago

lendle commented 1 year ago

Pandas version checks

Reproducible Example

import pandas as pd
import numpy as np

s = pd.Series()

a = np.array([1])
a2 = np.array([0,1])

assert a.shape == (1,) #array is 1 dimensional with 1 element

s['whatever'] = a
assert s['whatever'].shape  == a.shape #expected

s['whatever'] = a
print("shape after reassignment: ", s['whatever'].shape) #prints '()'

assert s['whatever'].shape == a.shape # FAILS HERE

s['whatever'] = a2
assert a2.shape ==(2,)
assert s['whatever'].shape == a2.shape #works for arrays with more than one element as expected

Issue Description

When reassigning a value to an index in a series, if the value is a numpy array with a single element, the dimension of the array is lost. When assigning to an index that does not exist, the array retains it's shape (1,)

Expected Behavior

The array's shape shouldn't depend on whether the index already exists in the series or not. This was the behavior in pandas 2.0.0.

Installed Versions

```INSTALLED VERSIONS ------------------ commit : 965ceca9fd796940050d6fc817707bba1c4f9bff python : 3.10.10.final.0 python-bits : 64 OS : Linux OS-release : 5.15.96-0-virt Version : #1-Alpine SMP Sun, 26 Feb 2023 15:14:12 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.0.2 numpy : 1.24.3 pytz : 2023.3 dateutil : 2.8.2 setuptools : 67.6.1 pip : 23.1 Cython : 0.29.35 pytest : 7.3.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.12.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : 2023.4.0 gcsfs : 2023.4.0 matplotlib : 3.7.1 numba : 0.57.0 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.10.1 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None ```
lendle commented 1 year ago

I think the change is due to this pr https://github.com/pandas-dev/pandas/pull/52906. Not sure what the expected behavior is but it probably shouldn't depend on whether a value exists at that index yet or not.

adideshpande commented 1 year ago

I am seeing this issue from pandas 2.0.2 [2.0.2 and 2.0.3]. I verified that the behavior was as expected in pandas 2.0.0 as well as 2.0.1

torokati44 commented 11 months ago

Pandas 2.1.1 is still affected. :frowning_face:

jensravesloot commented 5 months ago

Pandas 2.2.2 is still affected. ☹️