machow / siuba

Python library for using dplyr like syntax with pandas and SQL
https://siuba.org
MIT License
1.14k stars 48 forks source link

Pandas FutureWarning on certain groupby/mutate operations #454

Open edasmalchi opened 1 year ago

edasmalchi commented 1 year ago

It looks like using Siuba's group_by and mutate verbs now raises a warning from Pandas

For example, running this works as expected but displays the FutureWarning

from siuba import _, group_by, ungroup, filter, mutate, summarize
from siuba.data import mtcars

small_cars = mtcars[["cyl", "gear", "hp"]]

small_cars >> group_by(_.cyl, _.gear) >> mutate(sum_hp = _.hp.sum())
/opt/conda/lib/python3.10/site-packages/siuba/dply/verbs.py:296: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

    >>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

    >>> .groupby(..., group_keys=True)
machow commented 1 year ago

Thanks for reporting! Do you know if that is with the latest version of siuba (siuba.__version__ should be 0.4.0). I think it may be resolved in the latest release (this pandas deprecation is pretty recent I think)

edasmalchi commented 1 year ago

Looks like we've got 1.0.0a2 on the Cal-ITP JupyterHub -- tried bumping it with a quick pip install siuba==0.4.0 but that seemed to break the tbl function from calitp-py.

Not sure where our version comes from but I can make a ticket to look into it on our side.

machow commented 1 year ago

Ah shoot--siuba 0.4.0 introduces the tbl() function from dplyr. I should chosen a better name for tbl in calitp 😬.

I think if you import from calitp after importing from siuba, it should work, but that's definitely not ideal!

Do you have any thoughts what might work here? It might be able to warn if it would override a variable (like tbl from calitp). Let me think about this a bit more...

edasmalchi commented 1 year ago

Renaming tbl in calitp to something else might not be the worst idea, even if it means a bit of rework…