sfirke / janitor

simple tools for data cleaning in R
http://sfirke.github.io/janitor/
Other
1.38k stars 132 forks source link

[Feature Request] Allow `tabyl()` to accept character vectors as column names #572

Open EA-Ammar opened 4 months ago

EA-Ammar commented 4 months ago

Feature requests

It would be useful if tabyl() accepted a character vector of column names as the variables to be summarised, similar to many tidyselect() functions.

An example of how I imagine the code would work:

library("tidyverse")
library("janitor")

# Sample data
mydat <- tibble(x=c(1:4),y=c(5:8),z=c(11:14))

# Define vectors
vec1 <- c("x")
vec2 <- c("x","y")
vec3 <- c("x","y","z")

# All these lines produce the same output, equivalent to mydat %>% tabyl(x)
mydat %>% tabyl(all_of(vec1))
mydat %>% tabyl(all_of(vec2))
mydat %>% tabyl(all_of(vec3))

Currently tabyl only accepts the first element of the character vector as a parameter, so the tables are always one-way. Allowing the vector to be fed in with either 1, 2 or 3 arguments would allow a more flexible workflow, e.g. in custom functions integrating tabyl.

ggrothendieck commented 4 months ago

If it is too much effort to rework tabyl to use tidyselect syntax an alternative would be to implement a formula method.

library(dplyr)
library(janitor)

tabyl.formula <- function(dat, data, ..., envir = parent.frame()) {
    if (missing(data)) data <- as.data.frame(mget(all.vars(dat), envir))
    vars <- names(model.frame(dat, data))
    tabyl |> do.call(c(list(data), lapply(vars, as.name), list(...)))
}

Then we could do things like this:

# test
mydat %>% tabyl(reformulate(vec2), .)
mydat %>% select(any_of(vec2)) %>% tabyl(~., .)

The above uses the do.call code form https://stackoverflow.com/questions/78483724/r-pass-character-vector-of-column-names-to-function-which-can-optionally-take-m?noredirect=1#comment138367271_78483724