oldoc63 / learningDS

Learning DS with Codecademy and Books
0 stars 0 forks source link

Value Counts for Categorical Data #396

Open oldoc63 opened 2 years ago

oldoc63 commented 2 years ago

When it comes to categorical variables, the measures of central tendency and spread that worked for describing numeric variables, like mean and standard deviation, generally becomes unsuitable when we're dealing with discrete values. Unlike numbers, categorical values are not continuous and oftentimes do not have an intrinsic ordering.

Instead, a good way to summarize categorical variables is to generate a frequency table containing the count of each distinct value. For example, we may be interested to know how many of the New York City rental listings are from each borough. Related, we can also find which borough has the most listings.

The pandas library offers the .value_counts() method for generating the counts of all values in a DataFrame column:

oldoc63 commented 2 years ago

By default, it returns the results sorted in descending order by count, where the top element is the mode, or the most frequently appearing value. In this case the mode is Manhattan with 3,539 rental listings.