Open 1112114641 opened 1 year ago
Can I pick this up and get two weeks to dig into the codebase? Thanks
@szczekulskij Just wanted to kindly remind you that commenting on 'take' will automatically assign you to this issue. No pressure though!
as an interesting aside, I discovered a new method of defining aggregations, and this code works also:
pd.DataFrame(
{"a":np.repeat([0,1,2,3,4], 10),"b":range(50,100)},
index=pd.date_range(start=pd.Timestamp.now(),freq="1min",periods=50)
).groupby("a",as_index=False).resample("2min").agg(b=("b","min"))
take
To keep this thread updated - I've identified the problem is in pandas/core/groupby/generic.py.
Line 421: if not self.as_index and not_indexed_same:
- if this line doesn't result in True, everything works as expected
I'll now work on solving this
Hey, could I ask for review on the attached PR ? It definitely solves the issue, but I'm open for feedback - I'm still learning more about the repo
The last example in the OP doesn't raise on 1.5.x, marking as a regression.
It's not clear to me what the expected behavior of using as_index=False
should be. I'm not finding any tests with groupby(...).resample(...)
and as_index=False
. In the docs, we say as_index
has no impact on filtrations or transformations, but a resample doesn't fall into these categories.
cc @jbrockmendel @mroeschke @topper-123
A closer look at behavior, it appears to me as_index
has no impact to resampler across methods. I think we stick with that for this issue, if we want to change that then we can take it up in a dedicated issue.
Hey @rhshadrach, sorry for delay - updated now. Happy to create a seperate ticket to implement a different functionality dependent on whether as_index=False (if I understood correctely, this is what you've discussed in the previous point)
Thanks for feedback and going w. me through this ticket
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Using pandas 2.0.0 / python 3.9.13 running the above code yields different behaviour based on whether the
as_index=False
flag is set in the.groupby()
method or not.The resulting error message
ValueError: Length of values (5) does not match length of index (30)
is misleading, uninformative, and does not explain the difference in behaviour between.agg({"b":"min"})
, and.agg(min)
. The same holds true for other standard.groupby
aggregations (max, first, ...).Could you please look into this?
Expected Behavior
.agg({"b":"min"})
and.agg(min)
should yield the same result, irrespective ofas_index=True/False
.Installed Versions