r-gregmisc / gtools

Functions to assist in R programming
25 stars 6 forks source link

Add statistical mode function #9

Closed GitHunter0 closed 3 years ago

GitHunter0 commented 3 years ago

Hey folks, I think gtools has some really useful functions like invalid(), mixedsort() and defmacro(). Would you consider adding a statistical mode too? I will never understand why R base does not include it. Thank you

warnes commented 3 years ago

Excellent idea. Can you provide code?

The function will need to properly handle the edge cases:

GitHunter0 commented 3 years ago

Hi @warnes , thanks for the interest!

Yes, I have code covering all cases I believe, what do you think?

Modes <- function(vec, 
                  output = "all", # options: "first", "last", "all"
                  na.rm = TRUE,
                  multiple_modes = TRUE) {

  # The output of any empty vector will be itself (an empty vector of the same
  # type), which I believe is the best behavior. 
  # However, If instead you want the function to return NA, just remove the line
  # below.
  if (length(vec)==0) return(vec)

  if (na.rm) { uv <- unique(na.omit(vec)) } else { uv <- unique(vec) }  

  tab <- tabulate(match(vec, uv))

  all_modes <- uv[tab == max(tab)]

  if (output=="first") { 

    res <- all_modes[1]

  } else if (output=="last"){

    res <- all_modes[length(all_modes)]

  } else if (output=="all"){

    res <- all_modes
  }

  if (!multiple_modes & length(res)>1) { return(NA) } else { return(res) }

}

# - # Character vector
chr_vec <- c('a','d','d','h','h',NA,NA) # Multiple modes
Modes(vec = chr_vec)
#> [1] "d" "h"
Modes(vec = chr_vec, na.rm=FALSE)
#> [1] "d" "h" NA
Modes(vec = chr_vec, na.rm=FALSE, output="first")
#> [1] "d"
Modes(vec = chr_vec, na.rm=FALSE, output="last")
#> [1] NA

# - # Numeric vector
# See that it keeps the original vector type
num_vec <- c(2,3,3,4,4,NA,NA)
Modes(vec = num_vec)
#> [1] 3 4
Modes(vec = num_vec, na.rm=FALSE)
#> [1]  3  4 NA
Modes(vec = num_vec, na.rm=FALSE, output="first")
#> [1] 3
Modes(vec = num_vec, na.rm=FALSE, output="last")
#> [1] NA

# The default option is output="all" but it is very easy for the user to control
# the output without changing this parameter.
# Select always just one mode, being that the first mode
Modes(vec = num_vec)[1]
#> [1] 3
# Select the first and the second modes
Modes(vec = num_vec)[c(1,2)]
#> [1]  3 4

# - # Logical Vectors
Modes(vec = c(TRUE,TRUE))
#> [1] TRUE
Modes(vec = c(FALSE,FALSE,TRUE,TRUE))
#> [1] FALSE  TRUE

# - # Single element cases
Modes(vec = c(NA_real_))
#> [1] NA
Modes(vec = 2)
#> [1] 2
Modes(vec = NA)
#> [1] NA
Modes(vec = c('a'))
#> [1] "a"

# - # Not allowing multiple modes, returning NA if that happens
Modes(vec = c(1,1,2,2), multiple_modes = FALSE) # multiple modes
#> [1] NA
Modes(vec = c(1,1), multiple_modes = FALSE) # single mode
#> [1] 1

# - # Empty vector cases
# The output of any empty vector will be itself (an empty vector of the same type) 
Modes(vec = double())
#> numeric(0)
Modes(vec = complex())
#> complex(0)
Modes(vec = vector('numeric'))
#> numeric(0)
Modes(vec = vector('character'))
#> character(0)

Created on 2021-05-26 by the reprex package (v0.3.0)

warnes commented 3 years ago

Thanks for the suggestion and the code. I used it to add a new function stat_mode to gtools version 3.9.0, which I've just submitted to CRAN.

If you're impatient to use it, use devtools::install_gitub("r-gregmisc/gtools") to install it from here.

GitHunter0 commented 3 years ago

Excellent @warnes , thanks for the addition!