Closed jpivarski closed 1 year ago
Question: what do we expect to happen for value_counts on an ak column?
For reference, values
should not be callable, but a property:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': ["oi"]*10000})
In [3]: df.a.values
Out[3]: array(['oi', 'oi', 'oi', ..., 'oi', 'oi', 'oi'], dtype=object)
Question: what do we expect to happen for value_counts on an ak column?
For ak column it will be not good to do that probably. But, what about other non-ak columns, we should be able to have Counts of those. Currently I am using python Counter for the alternatives, but I believe in default pandas df it does what it should do. Thanks.
.value_counts works on a series (column), so you can freely use it on non-ak data in a dataframe that happens to also have ak data.
But it's not working as you can see above (second image) when I call on 'processName' column which is not ak-column.
For reference, hits is a ttree which I convert to arrays(library='pd') that converts to awkward-pandas df, not the regular pandas df. value_counts() are working for the column with numerical values, but not working for the 'string' values columns.
I convert to arrays(library='pd') that converts to awkward-pandas df, not the regular pandas df.
Maybe there is confusion here? This library provides ak-type columns in regular pandas dataframes. I don't know what an awkward-pandas df would be.
It would be useful to see a full reproducer so that we can help.
Because value_counts() is not working on this dataframe I made using above code, and this rep name is also awkard-pandas, I thought the dataframe would be not the regular df of pandas. But that's not the problem, the problem is; value_counts() method is not working with the columns that has string data type. But working for the columns that has numerical data type. I think it is clear from my image and what I saying here. I searched elsewhere, and I see lots of question about this, but no answer, so I post here.
Thanks.
Can you please explicitly check the dtype of df["processName"] - is it really a string column?
It is working after I updated my systems. Sorry for the confusions. Thanks for the quick replies though.
Reported by @sbdrchauhan as https://github.com/scikit-hep/uproot5/issues/895 (and manually copied here).
I hope my problem is self explanatory by images. Thank you.
I want this feature ON for pandas df so my life becomes lot easier. Thanks.