has2k1 / plydata

A grammar for data manipulation in Python
https://plydata.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
276 stars 11 forks source link

group_by broken #23

Closed jotwin closed 3 years ago

jotwin commented 4 years ago

I haven't been able to use anything with group_by in it since upgrading to pandas to 1.1.0+

pd.DataFrame({'a':range(10)}) >> count('a==0')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-8092b63f45c6> in <module>
----> 1 pd.DataFrame({'a':range(10)}) >> count('a==0')

/usr/local/lib/python3.7/site-packages/plydata/operators.py in __rrshift__(self, other)
    122         self.data = other
    123         func = get_verb_function(self.data, self.__class__.__name__)
--> 124         return func(self)
    125 
    126     def __call__(self, data):

/usr/local/lib/python3.7/site-packages/plydata/dataframe/helpers.py in count(verb)
     63     verb.add_ = True
     64     verb.data = group_by(verb)
---> 65     data = tally(verb)
     66 
     67     # Restore original groups

/usr/local/lib/python3.7/site-packages/plydata/dataframe/helpers.py in tally(verb)
     49 
     50     verb.expressions = [Expression(stmt, 'n')]
---> 51     data = summarize(verb)
     52     if verb.sort:
     53         data = data.sort_values(by='n', ascending=False)

/usr/local/lib/python3.7/site-packages/plydata/dataframe/one_table.py in summarize(verb)
    169             verb,
    170             keep_index=False,
--> 171             keep_groups=False).process()
    172     return data
    173 

/usr/local/lib/python3.7/site-packages/plydata/dataframe/common.py in process(self)
    217         gdfs = self._get_group_dataframes()
    218         egdfs = self._evaluate_expressions(gdfs)
--> 219         edata = self._concat(egdfs)
    220         return edata
    221 

/usr/local/lib/python3.7/site-packages/plydata/dataframe/common.py in _concat(self, egdfs)
    307             Evaluated data
    308         """
--> 309         egdfs = list(egdfs)
    310         edata = pd.concat(egdfs, axis=0, ignore_index=False, copy=False)
    311 

/usr/local/lib/python3.7/site-packages/plydata/dataframe/common.py in <genexpr>(.0)
    264             Result dataframes for each group
    265         """
--> 266         return (self._evaluate_group_dataframe(gdf) for gdf in gdfs)
    267 
    268     def _evaluate_group_dataframe(self, gdf):

/usr/local/lib/python3.7/site-packages/plydata/dataframe/common.py in _evaluate_group_dataframe(self, gdf)
    290             else:
    291                 _create_column(data, expr.column, value)
--> 292         data = _add_group_columns(data, gdf)
    293         return data
    294 

/usr/local/lib/python3.7/site-packages/plydata/dataframe/common.py in _add_group_columns(data, gdf)
     57     n = len(data)
     58     if isinstance(gdf, GroupedDataFrame):
---> 59         for i, col in enumerate(gdf.plydata_groups):
     60             if col not in data:
     61                 group_values = [gdf[col].iloc[0]] * n

TypeError: 'NoneType' object is not iterable
has2k1 commented 4 years ago

Yes pandas v1.1.0 broke grouping plydata. There is a PR at https://github.com/pandas-dev/pandas/pull/35688 to fix the issue but it has not been merged yet.

has2k1 commented 4 years ago

Fix will ship in Pandas 1.1.4.

antonio-yu commented 4 years ago

Fix will ship in Pandas 1.1.4.

Appreciate your working on plydata and plotnine. Does plydata 0.4.2 only support pandas under 1.1.0? My pandas is 1.1.3. when I tried to install plydata,it collected pandas 1.0.5.

has2k1 commented 4 years ago

Fix will ship in Pandas 1.1.4.

Appreciate your working on plydata and plotnine. Does plydata 0.4.2 only support pandas under 1.1.0?

Yes, but see below.

My pandas is 1.1.3. when i tried to install plydata,i collected pandas 1.0.5.

I just noticed that pandas 1.1.4 shipped 4 days ago. I will also make a release.

antonio-yu commented 4 years ago

Thanks for your reply. Then,my friends and I will wait for your new release. We are all satisfied when seeing the update after 3 years.

has2k1 commented 4 years ago

I spoke too soon, the fix in pandas 1.1.4 was incomplete. I will have to wait for https://github.com/pandas-dev/pandas/pull/37461 to ship, if accepted it is marked to be included v1.1.5.

has2k1 commented 4 years ago

/remind me on 30th November 2020. Pandas v1.1.5 will ship or Pandas v1.2.0 will be out.

antonio-yu commented 3 years ago

Hi,Pandas V1.1.5 has shipped. By the way, have any plans to add a dplyr-style function of filter so that filtering rows by regex is more convenient?

has2k1 commented 3 years ago

plydata v0.4.3 is out.