Open randomgambit opened 7 years ago
@jorisvandenbossche @jreback same bug with transform
test.groupby('groups').transform(
{'value1' : 'mean',
'value2' : 'mean'})
only agg
works
test.groupby('groups').agg(
{'value1' : 'mean',
'value2' : 'mean'})
is this a nasty bug? thanks again!
agg is more general that apply
In [7]: test.groupby('groups').agg(
...: {'value1' : 'mean',
...: 'value2' : 'mean'})
...:
Out[7]:
value1 value2
groups
A 2 20
B 5 50
i guess it should work
@jreback yes, thanks, that's correct this is what I am saying as well: it works with agg
.
However, I do not want to aggregate, I want to use a transform
. The documentation https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transform.html says we should be able to feed a dict of column-functions..
What do you think? Thanks again!
if you want to submit a PR to fix it, by all means. (your example didn not indicate transform)
Has this been fixed yet? I think transform after groupby is a very useful feature to have.
Still open. Please let us know if you want to start a PR to fix this.
Is there any reason the documentation says that transform takes a dictionary, when it doesn't?
Transform also doesn't take a list, as the documentation says it does. To use the above example:
test.groupby('groups').value1.transform(['cumsum', 'cummax'])
...returns "TypeError: unhashable type: 'list'"
Would like to see this fixed too as an aggregate variant of transform would be very handy
I'm also confused by the documentation. Isn't there an easy way to transform just one column of a grouped DataFrame?
In pandas version 0.23.4, after group by a dataframe, it can not pass transform method a list of functions and can not rename the field name of a transformed dataframe using a nested dictionary, but it is very useful !!
@zeromh The referenced documentation where transform
accepts lists and dictionaries is for the dataframe method of transform
, not its groupby
cousin version. The doc string for the groupby
version correctly states that it accepts a function:
Signature: gb.transform(func, *args, **kwargs)
Docstring:
Call function producing a like-indexed DataFrame on each group and
return a DataFrame having the same indexes as the original object
filled with the transformed values
Parameters
----------
f : function
Function to apply to each subframe
Can this then be taken as a feature request, so that the same kind of apply/transform
usage be used on both DataFrame
and GroupBy
objects?
Can this then be taken as a feature request, so that the same kind of
apply/transform
usage be used on bothDataFrame
andGroupBy
objects?
Vote it! It is very useful
@colin1alexander Ah, my bad. Thanks for the clarification.
@jreback @TomAugspurger
I'm interested in tackling this, my understanding is that NDFrameGroupBy.transform()
and SeriesGroupBy.transform()
would need to be rewritten to accept a dict with column names as keys and functions as values, similar to NDFrameGroupBy.aggregate(). It seems like using
SeriesGroupBy._aggregate_multiple_funcs()` as a guideline for writing a multiple func transform method might be a good idea?
Yeah, that sounds about right. @WillAyd may have better thoughts on how to start.
Keep in mind, doing this for .apply
may be difficult / impossible because it doesn't place any restrictions on the output shape.
With .agg
and .transform
we at least know what the return shape should be, so we can know ahead of time what the output shape of a dict of functions will be.
Reading through the comments here I think there have been quite a few things talked about, but just so we are on the same page I assume we are explicitly talking about changing transform
to allow a dict where the key is the column name and the value(s) are the functions to be applied.
Not objected to it though I think it makes more sense if we updated transform
to accept a sequence first, as I don't think users will expect the values of a dict to be limited to just one function. @brianhuey if you wanted to try your hand at that would make sense to open as a separate PR first, get that one through and then come back to this
guys, as the original OP and lifelong pandas
supporter, let me reiterate that it would be very useful to have apply
, transform
, and agg
be able to work like this:
test.groupby('groups').transform(
{'value1' : {'value1_mean' : 'mean', 'value1_max' : 'max'},
'value2' : {'value2_mean' : 'mean'}})
This used to work back in the days with the good old agg
. It does not anymore.
This is very unfortunate because in one go I was able to use multiple functions on a single column (here mean
and max
on value1
) as well as rename them on the fly (so that these variables have the names I have chosen and the dataframe does not have some weird multicolumn index)
Do you think that syntax could be used in apply
, transform
and agg
? This syntax was just a great idea.
Thanks!!
We have a separate issue for an alternative to the deprecated dict of dicts in agg. Hoping to have that for 0.24.
From: Olaf notifications@github.com Sent: Saturday, October 13, 2018 10:43:47 PM To: pandas-dev/pandas Cc: Tom Augspurger; Mention Subject: Re: [pandas-dev/pandas] TypeError: unhashable type: 'dict' when using apply/transform? (#17309)
guys, as the original OP and lifelong pandas supporter, let me reiterate that it would be very useful to have apply, transform, and agg be able to work like this:
test.groupby('groups').transform( {'value1' : {'value1_mean' : 'mean', 'value1_max' : 'max'}, 'value2' : {'value2_mean' : 'mean'}})
This used to work back in the days with the good old agg. It does not anymore.
This is very unfortunate because in one go I was able to use multiple functions on a single column (here mean and max on value1) as well as rename them on the fly (so that these variables have the names I have chosen and the dataframe does not have some weird multicolumn index)
Do you think that syntax could be used in apply, transform and agg? This syntax was just a great idea.
Thanks!!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/17309#issuecomment-429594051, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQHIiMk5mlzkrV61ONerWwlPkNE6EN3ks5ukrLzgaJpZM4O-ueX.
@TomAugspurger thanks but we re talking about extending that to apply
, transform
and agg
right?
@WillAyd
Just so I'm clear, you're suggesting something like:
test.groupby('groups').transform({'value1': [np.mean, max], 'value2': max})
which should return something like:
value1 value2
mean max max
2012-02-01 14:00:00 2 3 30
2012-02-01 14:01:00 2 3 30
2012-03-05 14:04:00 2 3 30
2012-03-05 14:01:00 5 6 60
2012-03-10 14:02:00 5 6 60
2012-03-11 14:07:50 5 6 60
My point is that it would make more sense to make sure this works:
test.groupby('groups').transform([np.mean, max])
Before attempting:
test.groupby('groups').transform({'value1': [np.mean, max])
Because the mechanisms to ensure that the list of functions are acceptable will probably be "reused" when it comes time to accepting a value from a dictionary which is a list
Somewhat of a side note but the hierarchical column structure of the result is going to be entangled somewhat in the https://github.com/pandas-dev/pandas/issues/18366#issuecomment-425212844. I don't believe that should be a blocker but just a consideration point for devs
Hi everyone, I just stumbled upon the same issue. It would be very important imo to cover this in the documentation. At least I have been very confused by it, since the only entry in the docs regarding transform clearly says that lists and dicts of functions can be passed as an argument. It was not clear to me that the same syntax does not apply to grouped objects.
I just stumbled upon this and after checking the docs at padas 0.24.2 DataFrame.transform I see that it still says that dict is supported as func
value. I'm guessing from this discussion that it's because the DataFrame.transform does accept it but the GroupBy.transform does not. I't very confusing, is there any quick fix for this (documentation issue).
Also, is there any advance on getting the desired feature into a next release? I'm been using pandas for a while now but never actually attempted to contribute. I can try to implement this with a little guidance if someone is willing to help me out.
@elpablete you linked to DataFrame.transform. That would be a different issue. This is about DataFrameGroupBy.transform.
@TomAugspurger I cannot find the docs for "DataFrameGroupBy.transform". I found pandas.core.groupby.GroupBy.transform
which I would think are the same, but still, those are empty and thus, one would be inclined to think they have the same interface as pandas.DataFrame.transform
.
That's my point when I say it's very confusing.
maybe could provide a more helpful error message (with link to groupby.transform/apply docs) and maybe raise NotImplementedError in the short term
Hello!
I am quite puzzled by some inconsistencies when using
apply
. Consider this simple exampleNow, this WORKS
but this FAILS
This worked in prior versions of Pandas. What is the new syntax then? Some very useful variant of the code above I used to use was:
to rename the new variables on the fly. Is this still possible now? Is this a bug?
Many thanks!