pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.82k stars 17.99k forks source link

TypeError: unhashable type: 'dict' when using apply/transform? #17309

Open randomgambit opened 7 years ago

randomgambit commented 7 years ago

Hello!

I am quite puzzled by some inconsistencies when using apply. Consider this simple example

idx=[pd.to_datetime('2012-02-01 14:00:00') , 
     pd.to_datetime('2012-02-01 14:01:00'),
     pd.to_datetime('2012-03-05 14:04:00'),
     pd.to_datetime('2012-03-05 14:01:00'),
     pd.to_datetime('2012-03-10 14:02:00'),
     pd.to_datetime('2012-03-11 14:07:50')
     ]

test=pd.DataFrame({'value1':[1,2,3,4,5,6],
                   'value2':[10,20,30,40,50,60],
                   'groups' : ['A','A','A','B','B','B']},
    index=idx)

test
Out[22]: 
                    groups  value1  value2
2012-02-01 14:00:00      A       1      10
2012-02-01 14:01:00      A       2      20
2012-03-05 14:04:00      A       3      30
2012-03-05 14:01:00      B       4      40
2012-03-10 14:02:00      B       5      50
2012-03-11 14:07:50      B       6      60

Now, this WORKS

test.groupby('groups').apply(lambda x: x.resample('1 T', label='left', closed='left').apply(
        {'value1' : 'mean',
         'value2' : 'mean'}))

but this FAILS

test.groupby('groups').apply(
        {'value1' : 'mean',
         'value2' : 'mean'})

Traceback (most recent call last):

  File "<ipython-input-24-741304ecf105>", line 3, in <module>
    'value2' : 'mean'})

  File "C:\Users\\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 696, in apply
    func = self._is_builtin_func(func)

  File "C:\Users\\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\base.py", line 730, in _is_builtin_func
    return self._builtin_table.get(arg, arg)

TypeError: unhashable type: 'dict'

This worked in prior versions of Pandas. What is the new syntax then? Some very useful variant of the code above I used to use was:

test.groupby('groups').apply(
        {'newname1' : {'value1' : 'mean'},
         'newname2' : {'value2' : 'mean'}})

to rename the new variables on the fly. Is this still possible now? Is this a bug?

Many thanks!

randomgambit commented 7 years ago

@jorisvandenbossche @jreback same bug with transform

test.groupby('groups').transform(
        {'value1' : 'mean',
         'value2' : 'mean'})

only agg works

test.groupby('groups').agg(
        {'value1' : 'mean',
         'value2' : 'mean'})

is this a nasty bug? thanks again!

jreback commented 7 years ago

agg is more general that apply

In [7]: test.groupby('groups').agg(
   ...:         {'value1' : 'mean',
   ...:          'value2' : 'mean'})
   ...: 
Out[7]: 
        value1  value2
groups                
A            2      20
B            5      50

i guess it should work

randomgambit commented 7 years ago

@jreback yes, thanks, that's correct this is what I am saying as well: it works with agg.

However, I do not want to aggregate, I want to use a transform. The documentation https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transform.html says we should be able to feed a dict of column-functions..

What do you think? Thanks again!

jreback commented 7 years ago

if you want to submit a PR to fix it, by all means. (your example didn not indicate transform)

znwang25 commented 6 years ago

Has this been fixed yet? I think transform after groupby is a very useful feature to have.

TomAugspurger commented 6 years ago

Still open. Please let us know if you want to start a PR to fix this.

zeromh commented 6 years ago

Is there any reason the documentation says that transform takes a dictionary, when it doesn't?

zeromh commented 6 years ago

Transform also doesn't take a list, as the documentation says it does. To use the above example:

test.groupby('groups').value1.transform(['cumsum', 'cummax'])

...returns "TypeError: unhashable type: 'list'"

xx396 commented 6 years ago

Would like to see this fixed too as an aggregate variant of transform would be very handy

gsmafra commented 6 years ago

I'm also confused by the documentation. Isn't there an easy way to transform just one column of a grouped DataFrame?

Alxe1 commented 6 years ago

In pandas version 0.23.4, after group by a dataframe, it can not pass transform method a list of functions and can not rename the field name of a transformed dataframe using a nested dictionary, but it is very useful !!

colinalexander commented 6 years ago

@zeromh The referenced documentation where transform accepts lists and dictionaries is for the dataframe method of transform, not its groupby cousin version. The doc string for the groupby version correctly states that it accepts a function:

Signature: gb.transform(func, *args, **kwargs)
Docstring:
Call function producing a like-indexed DataFrame on each group and
return a DataFrame having the same indexes as the original object
filled with the transformed values

Parameters
----------
f : function
    Function to apply to each subframe
sainathadapa commented 6 years ago

Can this then be taken as a feature request, so that the same kind of apply/transform usage be used on both DataFrame and GroupBy objects?

Alxe1 commented 6 years ago

Can this then be taken as a feature request, so that the same kind of apply/transform usage be used on both DataFrame and GroupBy objects?

Vote it! It is very useful

zeromh commented 6 years ago

@colin1alexander Ah, my bad. Thanks for the clarification.

brianhuey commented 6 years ago

@jreback @TomAugspurger I'm interested in tackling this, my understanding is that NDFrameGroupBy.transform() and SeriesGroupBy.transform() would need to be rewritten to accept a dict with column names as keys and functions as values, similar to NDFrameGroupBy.aggregate(). It seems like usingSeriesGroupBy._aggregate_multiple_funcs()` as a guideline for writing a multiple func transform method might be a good idea?

TomAugspurger commented 6 years ago

Yeah, that sounds about right. @WillAyd may have better thoughts on how to start.

Keep in mind, doing this for .apply may be difficult / impossible because it doesn't place any restrictions on the output shape.

With .agg and .transform we at least know what the return shape should be, so we can know ahead of time what the output shape of a dict of functions will be.

WillAyd commented 6 years ago

Reading through the comments here I think there have been quite a few things talked about, but just so we are on the same page I assume we are explicitly talking about changing transform to allow a dict where the key is the column name and the value(s) are the functions to be applied.

Not objected to it though I think it makes more sense if we updated transform to accept a sequence first, as I don't think users will expect the values of a dict to be limited to just one function. @brianhuey if you wanted to try your hand at that would make sense to open as a separate PR first, get that one through and then come back to this

randomgambit commented 6 years ago

guys, as the original OP and lifelong pandas supporter, let me reiterate that it would be very useful to have apply, transform, and agg be able to work like this:

test.groupby('groups').transform(
        {'value1' : {'value1_mean' : 'mean', 'value1_max' : 'max'},
         'value2' : {'value2_mean' : 'mean'}})

This used to work back in the days with the good old agg. It does not anymore.

This is very unfortunate because in one go I was able to use multiple functions on a single column (here mean and max on value1) as well as rename them on the fly (so that these variables have the names I have chosen and the dataframe does not have some weird multicolumn index)

Do you think that syntax could be used in apply, transform and agg? This syntax was just a great idea.

Thanks!!

TomAugspurger commented 6 years ago

We have a separate issue for an alternative to the deprecated dict of dicts in agg. Hoping to have that for 0.24.


From: Olaf notifications@github.com Sent: Saturday, October 13, 2018 10:43:47 PM To: pandas-dev/pandas Cc: Tom Augspurger; Mention Subject: Re: [pandas-dev/pandas] TypeError: unhashable type: 'dict' when using apply/transform? (#17309)

guys, as the original OP and lifelong pandas supporter, let me reiterate that it would be very useful to have apply, transform, and agg be able to work like this:

test.groupby('groups').transform( {'value1' : {'value1_mean' : 'mean', 'value1_max' : 'max'}, 'value2' : {'value2_mean' : 'mean'}})

This used to work back in the days with the good old agg. It does not anymore.

This is very unfortunate because in one go I was able to use multiple functions on a single column (here mean and max on value1) as well as rename them on the fly (so that these variables have the names I have chosen and the dataframe does not have some weird multicolumn index)

Do you think that syntax could be used in apply, transform and agg? This syntax was just a great idea.

Thanks!!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/17309#issuecomment-429594051, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQHIiMk5mlzkrV61ONerWwlPkNE6EN3ks5ukrLzgaJpZM4O-ueX.

randomgambit commented 6 years ago

@TomAugspurger thanks but we re talking about extending that to apply, transform and agg right?

brianhuey commented 6 years ago

@WillAyd Just so I'm clear, you're suggesting something like: test.groupby('groups').transform({'value1': [np.mean, max], 'value2': max}) which should return something like:

                    value1     value2    
                      mean max   max
2012-02-01 14:00:00      2   3    30
2012-02-01 14:01:00      2   3    30
2012-03-05 14:04:00      2   3    30
2012-03-05 14:01:00      5   6    60
2012-03-10 14:02:00      5   6    60
2012-03-11 14:07:50      5   6    60
WillAyd commented 6 years ago

My point is that it would make more sense to make sure this works:

test.groupby('groups').transform([np.mean, max])

Before attempting:

test.groupby('groups').transform({'value1': [np.mean, max])

Because the mechanisms to ensure that the list of functions are acceptable will probably be "reused" when it comes time to accepting a value from a dictionary which is a list

Somewhat of a side note but the hierarchical column structure of the result is going to be entangled somewhat in the https://github.com/pandas-dev/pandas/issues/18366#issuecomment-425212844. I don't believe that should be a blocker but just a consideration point for devs

FelixAntonSchneider commented 5 years ago

Hi everyone, I just stumbled upon the same issue. It would be very important imo to cover this in the documentation. At least I have been very confused by it, since the only entry in the docs regarding transform clearly says that lists and dicts of functions can be passed as an argument. It was not clear to me that the same syntax does not apply to grouped objects.

elpablete commented 5 years ago

I just stumbled upon this and after checking the docs at padas 0.24.2 DataFrame.transform I see that it still says that dict is supported as func value. I'm guessing from this discussion that it's because the DataFrame.transform does accept it but the GroupBy.transform does not. I't very confusing, is there any quick fix for this (documentation issue).

elpablete commented 5 years ago

Also, is there any advance on getting the desired feature into a next release? I'm been using pandas for a while now but never actually attempted to contribute. I can try to implement this with a little guidance if someone is willing to help me out.

TomAugspurger commented 5 years ago

@elpablete you linked to DataFrame.transform. That would be a different issue. This is about DataFrameGroupBy.transform.

elpablete commented 5 years ago

@TomAugspurger I cannot find the docs for "DataFrameGroupBy.transform". I found pandas.core.groupby.GroupBy.transform which I would think are the same, but still, those are empty and thus, one would be inclined to think they have the same interface as pandas.DataFrame.transform.

That's my point when I say it's very confusing.

simonjayhawkins commented 4 years ago

maybe could provide a more helpful error message (with link to groupby.transform/apply docs) and maybe raise NotImplementedError in the short term