Closed DonBeo closed 8 years ago
We have pd.value_counts
for the first one, and pd.crosstab
for the second, which seems sufficient to me.
Thanks I was not aware of these two functions. They are probably enough then.
Oh, if you have an R background, we'd appreciate more documentation and examples here
I would like to raise this request again and wish that the issue be reopened. The fact that pandas does not have a simple tabulation function with frequencies is one of the main barriers to adoption by new learners and a constant source of frustration to more advanced users.
Data analysis almost always requires simple exploratory frequency table tabulations. In a language such as Stata, it's absolutely trivial to get a simple useful frequency tabulation:
. tab spdlimit
Speed limit | Freq. Percent Cum.
------------+-----------------------------------
40 | 1 2.56 2.56
45 | 3 7.69 10.26
50 | 7 17.95 28.21
55 | 15 38.46 66.67
60 | 11 28.21 94.87
65 | 1 2.56 97.44
70 | 1 2.56 100.00
------------+-----------------------------------
Total | 39 100.00
And it's similarly simple in R. But, as pointed out here by @chris1610 a minimum equivalent to produce the same output in pandas would be this thicket of code:
pd.concat([df['spdlimit'].value_counts().rename('count'),
df['spdlimit'].value_counts(normalize=True)
.mul(100).rename('percentage')], axis=1)
count | percentage | |
---|---|---|
55 | 15 | 38.4615 |
60 | 11 | 28.2051 |
50 | 7 | 17.9487 |
45 | 3 | 7.69231 |
40 | 1 | 2.5641 |
65 | 1 | 2.5641 |
70 | 1 | 2.5641 |
(and more code needed still to get things properly sorted). @chris1610 has very usefully created the sidetable library to address some of this missing functionality https://github.com/chris1610/sidetable#freq
It would be really much better if this functionality was built into pandas. It seems easy to implement and it would be immediately useful and popular.
@TomAugspurger response is sufficient
these are also well documented
but if you want to add in the intro for R Users section ok
I think it would be useful to have a command similar to
table
in R ortab
in Stata.Given a vector
v
table(v)
should return the frequency of each value inv
.table(v1, v2)
should return the cross tabulation ofv1
andv2