Open PeterJPRoche opened 5 years ago
Thanks for the report; this is definitely a bug. It looks like Series.mode() is returning a single-entry Series (groupby.generic L1011) while SeriesGroupby.transform
expects either a scalar or an array of the same length as the calling Series.
I'm not as sure about this. mode
is not like other reduction operations because it doesn't necessarily reduce down to a scalar. In your example both 300 and 301 would be the mode for ID 3.
What are you expecting to have happen here?
Hi, sorry, just coming back to this now.
In the case of a series that has two or more modes, then the distribution is multimodal, it does not have a single mode. It is up to the user to decide how they wish to handle that in their application/problem at hand. I guess it would be neat if the could return the array of modes when there is more than one.
With regards the issue above, I would just have expected the same behavior/answers as what 0.22.0 gave. Why the change of behavior from 0.22.0 to 0.24.2? Incidentally, I happened to notice this same issue is present in 0.25.0
I don't think we would want to return an array of modes when there are multiple That would mean you sometimes return an object-dtype, sometimes numeric-dtpye depending on the data.
I would agree with @TomAugspurger about the return types - a mixed bag of return types could be problematic...
I think there are two issues here:
I'm not sure. Are you interested in debugging those?
We'll still need a way to handle multi-modal groups. I'm not sure what's best here. We don't want data-dependent behavior (i.e. only raising when there happens to be multiple modes). But I'm not sure we want a different default way of handling multiple modes.
I am seeing a ValueError when running the following example code
the output is
however the index and data being checked in series.py (line 255) is
There only seems to be one value [110] in the data array, not two [110,110]. I would have expected the transform to operate on a series with the two values.
The code works fine (as I expected) in my previous version of pandas 0.22.0. I noticed the issue only after updating to pandas 0.24.2
I am not sure if this is expected behaviour or a bug, just posting here to get some help! There are features in 0.24.2 I would like (e.g. DataFrame.to_numpy()) to use but am stuck on this breaking code issue!
Thanks