pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.56k stars 17.9k forks source link

DOC: Merge/join treats lists/tuples differently than other list-likes #44077

Open rhshadrach opened 3 years ago

rhshadrach commented 3 years ago

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html?highlight=merge#pandas.DataFrame.merge

Documentation problem

The code for merge, and hence join (and other methods?) uses maybe_make_list which treats lists/tuples differently than other list-likes.

https://github.com/pandas-dev/pandas/blob/c4cce9b75b34179edf000314edf708768486fcbb/pandas/core/reshape/merge.py#L638-L640

The current documentation only mentions a list of strings (join) or a list (merge). The special treatment of lists/tuples should be made more explicit.

Suggested fix for documentation

merge, left_on argument (right_on is similar):

label, list/tuple of labels, or array-like, optional

A single label or a list or tuple of labels will be treated as column or index level name(s) to join on in the left DataFrame. Any other array-like or list of array-likes of the length of the left DataFrame are treated as if they are columns.

join, on argument:

label, list/tuple of labels, or array-like, optional

A single label or a list or tuple of labels will be treated as column or index level name(s) in the caller to join on the index in other. If multiple values given, the other DataFrame must have a MultiIndex. Passing array-likes other than lists and tuples will be used as the join key if it is not already contained in the calling DataFrame. When not specified, joins index-on-index. Like an Excel VLOOKUP operation.

jbrockmendel commented 1 year ago

is there a path forward where we deprecate to become stricter in what we accept?