pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.2k stars 17.77k forks source link

DOC: behavior of DataFrame.assign for dictionary inputs #49817

Open hisakatha opened 1 year ago

hisakatha commented 1 year ago

Pandas version checks

Location of the documentation

https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.assign.html

Documentation problem

pandas.DataFrame.assign seems to take dict (dictionary) objects and assign new columns consisting of the dictionary values corresponding to the DataFrame's index. However, I cannot find the documentation of the behavior.

c1 c2
0 0 100
1 1 102
2 3 104
3 5 106
df2 = pd.DataFrame({"c3": ["a", "b", "a", "x"], "c4": ["b", "a", "c", "d"]}).set_index("c3")
dict2 = {"a": "AAA", "b": "BBB", "c": "CCC", "y": "YYY"}
df2.assign(c5 = dict2)
c4 c5
c3
a b AAA
b a BBB
a c AAA
x d NaN

Suggested fix for documentation

I would like to use stable and documented features. Therefore, If this result is expected, I would like the developers to include the behavior in the documentation. If not, I would like to get a warning (or error) from the function.

rhshadrach commented 1 year ago

Thanks for the report! My expectation would be that

df = df.assign(x=obj)

is equivalent to

df["x"] = obj

with the exception for when obj is callable (where there is some special behavior). This is already the implementation of assign. I'm +1 on accepting any values and updating the documentation in this manner.