machow / siuba

Python library for using dplyr like syntax with pandas and SQL
https://siuba.org
MIT License
1.14k stars 48 forks source link

pandas grouped summarize fails if group_keys is set to false #457

Closed machow closed 1 year ago

machow commented 1 year ago

Note that siuba expects to put grouping columns on a summarize result by resetting the index. However, when group_keys is set to false, resetting the index fails.

from siuba.data import mtcars
from siuba import summarize

mtcars.groupby("cyl", group_keys=False).apply(lambda df: summarize(df, res=_.mpg.mean()))
# note no cyl on index
         res
0  26.663636
0  19.742857
0  15.100000

We should just have a grouped summarize set group_keys to true. It seems like we should also be checking whether any result columns have overridden grouping columns, and ensure that doesn't raise an error.

For example, this works in dplyr:

mtcars %>% group_by(cyl) %>% summarize(cyl = mean(mpg))
machow commented 1 year ago

Fixed in #459