kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
889 stars 103 forks source link

unable to aggregate and summarize counts for a categorical variable #88

Open dhairyadalal opened 5 years ago

dhairyadalal commented 5 years ago

So I want to be to simply do a group by and count on a column with categorical values. When running the code below

df = pd.DataFrame({"animal": ["cat", "cat", "dog", "dog"],
                   "breed": ["tabby", "short hair", "poodle", "pug"],
                   "age": [1,2,3,4]
                  })

df >> group_by(X.animal) >> summarize(count=n(X.name))

I run into a AttributeError: 'str' object has no attribute 'size' error.

In dplyr, this would be the equivalent of:

df %>% group_by(animal) %>% summarise(count = n())

Ignaciovf commented 4 years ago

The error message can be better than this, but you are counting by a column that does not exist. The following code yields the required result: dfsd >> group_by(X.animal) >> summarize(count=n(X.animal))

animal count 0 cat 2 1 dog 2