intake / akimbo

For when your data won't fit in your dataframe
https://akimbo.readthedocs.io
BSD 3-Clause "New" or "Revised" License
30 stars 6 forks source link

'AwkwardExtensionArray' object is not callable #31

Closed jpivarski closed 1 year ago

jpivarski commented 1 year ago

Reported by @sbdrchauhan as https://github.com/scikit-hep/uproot5/issues/895 (and manually copied here).


image

image

I hope my problem is self explanatory by images. Thank you.


I want this feature ON for pandas df so my life becomes lot easier. Thanks.

image

martindurant commented 1 year ago

Question: what do we expect to happen for value_counts on an ak column?

martindurant commented 1 year ago

For reference, values should not be callable, but a property:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': ["oi"]*10000})

In [3]: df.a.values
Out[3]: array(['oi', 'oi', 'oi', ..., 'oi', 'oi', 'oi'], dtype=object)
sbdrchauhan commented 1 year ago

Question: what do we expect to happen for value_counts on an ak column?

For ak column it will be not good to do that probably. But, what about other non-ak columns, we should be able to have Counts of those. Currently I am using python Counter for the alternatives, but I believe in default pandas df it does what it should do. Thanks.

martindurant commented 1 year ago

.value_counts works on a series (column), so you can freely use it on non-ak data in a dataframe that happens to also have ak data.

sbdrchauhan commented 1 year ago

But it's not working as you can see above (second image) when I call on 'processName' column which is not ak-column.

sbdrchauhan commented 1 year ago

For reference, hits is a ttree which I convert to arrays(library='pd') that converts to awkward-pandas df, not the regular pandas df. value_counts() are working for the column with numerical values, but not working for the 'string' values columns.

martindurant commented 1 year ago

I convert to arrays(library='pd') that converts to awkward-pandas df, not the regular pandas df.

Maybe there is confusion here? This library provides ak-type columns in regular pandas dataframes. I don't know what an awkward-pandas df would be.

It would be useful to see a full reproducer so that we can help.

sbdrchauhan commented 1 year ago

Because value_counts() is not working on this dataframe I made using above code, and this rep name is also awkard-pandas, I thought the dataframe would be not the regular df of pandas. But that's not the problem, the problem is; value_counts() method is not working with the columns that has string data type. But working for the columns that has numerical data type. I think it is clear from my image and what I saying here. I searched elsewhere, and I see lots of question about this, but no answer, so I post here.

Thanks.

martindurant commented 1 year ago

Can you please explicitly check the dtype of df["processName"] - is it really a string column?

sbdrchauhan commented 1 year ago

It is working after I updated my systems. Sorry for the confusions. Thanks for the quick replies though.