tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

Mode, in the statistical sense, at least for categorical variable #13

Open jennybc opened 7 years ago

jennybc commented 7 years ago

Mode of a categorical variable, in the statistical sense. I always feel embarrassed when I explain that R has no built-in way to compute the most frequent level of a factor. Here's one implementation from stackoverflow:

Mode <- function(x, na.rm = TRUE) {
  if(na.rm) {
    x = x[!is.na(x)]
  }
  ux <- unique(x)
  return(ux[which.max(tabulate(match(x, ux)))])
}
(x <- rep(1:5, c(1,2,3,2,1)))
#> [1] 1 2 2 3 3 3 4 4 5
Mode(x)
#> [1] 3
x[3] <- NA
Mode(x)
#> [1] 3
lionel- commented 7 years ago

Should this be in forcats? e.g. fac_mode().

For continuous variables it does not really make sense to get the sample mode, so it seems that this command is mostly about factors.

jennybc commented 7 years ago

Should this be in forcats?

Given that fct_count() is present in forcats, maybe fct_mode() belongs there as well.

As for continuous variables, yes perhaps that is a road too far. And yet base commands like hist() and density() blithely make default choices for bins or bandwidth. There is precedent for simply making a good faith effort at such things. But, yes, I really was targeting the factor case here.

hadley commented 5 years ago

Also, how do you want to handle ties? Would it be better for Mode to return a vector?