pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.63k stars 17.91k forks source link

BUG: GroupBy.__iter__ doesn't work for dict initialization #42980

Open truhanen opened 3 years ago

truhanen commented 3 years ago

Code Sample, a copy-pastable example

import pandas as pd
frame = pd.DataFrame([[1, 2], [3, 4]])
d = dict(frame.groupby(level=0))

Raised error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: attribute of type 'NoneType' is not callable

Problem description

Based on the documentation of Python dict & GroupBy.__iter__, the two should work together. The above code should behave identically to the working alternative

d = {}
for k, v in frame.groupby(level=0):
    d[k] = v

Expected Output

The code sample shouldn't raise an error, and the result should be

>>> d[0]
   0  1
0  1  2
>>> d[1]
   0  1
1  3  4

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : c7f7443c1bad8262358114d5e88cd9c8a308e8aa python : 3.8.11.final.0 python-bits : 64 OS : Linux OS-release : 5.13.8-1-MANJARO Version : #1 SMP PREEMPT Thu Aug 5 09:47:52 UTC 2021 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C LOCALE : None.None pandas : 1.3.1 numpy : 1.20.3 pytz : 2021.1 dateutil : 2.8.1 pip : 21.2.3 setuptools : 57.0.0 Cython : 0.29.23 pytest : 6.2.4 hypothesis : None sphinx : 4.0.3 blosc : None feather : None xlsxwriter : 1.4.3 lxml.etree : None html5lib : 1.1 pymysql : None psycopg2 : 2.9.1 (dt dec pq3 ext lo64) jinja2 : 3.0.1 IPython : 7.23.1 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 4.0.1 pyxlsb : None s3fs : None scipy : 1.7.0 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None
gurukiran07 commented 3 years ago

You would have to explicitly use iter with GroupBy object.

import pandas as pd
frame = pd.DataFrame([[1, 2], [3, 4]])
d = dict(iter(frame.groupby(level=0)))

d[0]
   0  1
0  1  2

d[1]
   0  1
1  3  4
truhanen commented 3 years ago

With iter you create an iterator from an iterable. GroupBy is an iterable because it implements the __iter__ method, and should therefore be directly compatible for dict initialization, as stated in the documentation I linked.

Of course there's a possibility that I have understood the documentation incorrectly.

jamesholcombe commented 3 years ago

take

rhshadrach commented 1 year ago

Python seems to be inferring that the groupby object is a Mapping, but I'm not sure why this is. I tried removing __getitem__ and .keys from the DataFrameGroupBy class and it still seems to be treating it as a Mapping.

When I use frame.groupby("a"), the error is TypeError: 'str' object is not callable. When I delete keys, the error becomes TypeError: 'NoneType' object is not callable.