pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.58k stars 17.9k forks source link

Cross tabulations for categorical data doesn't work as the way expected in the guide #31410

Closed GYHHAHA closed 4 years ago

GYHHAHA commented 4 years ago

Code Sample, a copy-pastable example if possible

In [73]: foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
In [74]: bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
In [75]: pd.crosstab(foo, bar)
Out[75]: 
col_0  d  e
row_0      
a      1  0
b      0  1

Problem description

In the latest user guide, it says "Any input passed containing Categorical data will have all of its categories included in the cross-tabulation, even if the actual data does not contain any instances of a particular category.". But why the example doesn't work like this way? (lack of the row 'c' and column 'f') Thanks !

MarcoGorelli commented 4 years ago

Seems that you're right - thanks for the report, @GYHHAHA !

Are you interested in investigating the issue / opening a pull request?

GYHHAHA commented 4 years ago

Of course, glad to. @MarcoGorelli