pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.74k stars 17.95k forks source link

Inconsistent behaviour when assigning to series? #25548

Open mwiebusch78 opened 5 years ago

mwiebusch78 commented 5 years ago

I noticed that .loc and __setitem__ behave very differently when assigning one series to a sub-range of another series:

>>> s = pd.Series(0.0, index=list('abcd'))
>>> s1 = pd.Series(1.0, index=list('ab'))
>>> s2 = pd.Series(2.0, index=list('xy'))
>>> s[['a', 'b']] = s2
>>> s  # names of s2 are ignored as expected
a    2.0
b    2.0
c    0.0
d    0.0
dtype: float64
>>> s.loc[['a', 'b']] = s2
>>> s  # not expected!!
a    NaN
b    NaN
c    0.0
d    0.0
dtype: float64
>>> s.loc[['a', 'b']] = s1
>>> s  # everything's fine if the indices match
a    1.0
b    1.0
c    0.0
d    0.0
dtype: float64

I'm not sure if this is intended behaviour but it seems odd.

I'm on pandas v. 0.24.1

WillAyd commented 5 years ago

Not sure I agree on expectation but this is rather nuanced. I think this should be raising a SettingWithCopyWarning for the first sample - @TomAugspurger maybe thoughts on your end?

TomAugspurger commented 5 years ago

I'm not sure what the rules are for setitem. It seems like labels are ignored when the lengths are the same?

In [48]: s3 = pd.Series([1, 2], index=['a', 'b'])

In [49]: target = s.copy()

In [50]: target[['a', 'b']] = s3; target
Out[50]:
a    1.0
b    2.0
c    0.0
d    0.0
dtype: float64

In [51]: target = s.copy()

In [52]: target[['a', 'b']] = s3[['b', 'a']]; target
Out[52]:
a    2.0
b    1.0
c    0.0
d    0.0
dtype: float64

But differing lengths triggers an alignment (output 2 and 3; though 3 is already aligned)?

I wouldn't expect a SettingWithCopyWarning on the first one. The target isn't a (maybe) copy of another object. This is all in a single call to __setitem__ so it's fine (as opposed to x = s[['a', 'b']]; x = s2)

phofl commented 4 years ago

This seems to be consitent now and returns

a    NaN
b    NaN
c    0.0
d    0.0
dtype: float64
a    NaN
b    NaN
c    0.0
d    0.0
dtype: float64
a    1.0
b    1.0
c    0.0
d    0.0
dtype: float64

Is this the expected output now?

phofl commented 1 year ago

This is expected

srkds commented 1 year ago

I noticed that .loc and __setitem__ behave very differently when assigning one series to a sub-range of another series:

>>> s = pd.Series(0.0, index=list('abcd'))
>>> s1 = pd.Series(1.0, index=list('ab'))
>>> s2 = pd.Series(2.0, index=list('xy'))
>>> s[['a', 'b']] = s2
>>> s  # names of s2 are ignored as expected
a    2.0
b    2.0
c    0.0
d    0.0
dtype: float64
>>> s.loc[['a', 'b']] = s2
>>> s  # not expected!!
a    NaN
b    NaN
c    0.0
d    0.0
dtype: float64
>>> s.loc[['a', 'b']] = s1
>>> s  # everything's fine if the indices match
a    1.0
b    1.0
c    0.0
d    0.0
dtype: float64

I'm not sure if this is intended behaviour but it seems odd.

I'm on pandas v. 0.24.1

I tried executing the same example and got the same result. pandas version == 2.0.0

>>> s.loc[['a', 'b']] = s2
> >>> s  # This should be the expected output or it works as intended (o/p with NaN one)?
> a    2.0
> b    2.0
> c    0.0
> d    0.0