pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.82k stars 17.99k forks source link

is_list_like should return false for tuples #24702

Open toobaz opened 5 years ago

toobaz commented 5 years ago

Code Sample, a copy-pastable example if possible

In [2]: pd.core.dtypes.common.is_list_like((2,3))
Out[2]: True

Problem description

We discussed several times the fact that tuples in pandas should not be considered collections of things, but rather

(simple way to discriminate: if you could easily add an element, it is a collection; if instead the number of elements is somewhat hardcoded, it is not).

It is perfectly natural, and would solve problems/hacks such as

https://github.com/pandas-dev/pandas/commit/32ee9732b823448b87848f6bcaefdc762868999c#diff-1e79abbbdd150d4771b91ea60a4e1cc7R2701

https://github.com/pandas-dev/pandas/pull/24697#issuecomment-453078627

... and many others, to change the behavior of is_list_like, which is used in many places.

See #23061 for a similar fix (although the similarity breaks whereas sets are intrinsically different from a list, while for tuples it is a design decision).

I do expect some tests to break, and I also expect that in some cases, we'll want to preserve backwards compatibility... but at least let's set a sane default.

Expected Output

False

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: 040f06f731a09fc6e0663cada6697f6602b36f1d python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-8-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: it_IT.UTF-8 LOCALE: it_IT.UTF-8 pandas: 0.24.0.dev0+1282.g040f06f73 pytest: 3.5.0 pip: 9.0.1 setuptools: 39.2.0 Cython: 0.28.4 numpy: 1.14.3 scipy: 0.19.0 pyarrow: None xarray: None IPython: 6.2.1 sphinx: 1.5.6 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.4 blosc: None bottleneck: 1.2.0dev tables: 3.3.0 numexpr: 2.6.1 feather: 0.3.1 matplotlib: 2.2.2.post1634.dev0+ge8120cf6d openpyxl: 2.3.0 xlrd: 1.0.0 xlwt: 1.3.0 xlsxwriter: 0.9.6 lxml.etree: 4.1.1 bs4: 4.5.3 html5lib: 0.999999999 sqlalchemy: 1.0.15 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: 0.2.1 gcsfs: None
toobaz commented 5 years ago

By the way: clearly documenting, once and for all, what a "list-like" is in pandas is an essential part of the fix.

rajibmitra commented 5 years ago

Can I work on documentation part , or its totally internal ?

h-vetinari commented 5 years ago

Loosely related xref: #24688

It's maybe worth noting that my original plan for #23065 was to introduce a strict kwarg for is_list_like, but that was voted down in favor of a more specific keyword (see e.g. here).

It may be worth revisiting that decision if people want to resolve the issue brought up in the OP. The question "what is list-like" could then have several answers (which may or may not be desirable). Examples:

Tagging participants of the discussion in #23065: @TomAugspurger @jorisvandenbossche @jschendel @jreback