nathaneastwood / poorman

A poor man's dependency free grammar of data manipulation
https://nathaneastwood.github.io/poorman/
Other
340 stars 15 forks source link

Add pivot_longer() #48

Closed nathaneastwood closed 2 years ago

grantmcdermott commented 2 years ago

I'm sure you've already considered this, but a simple proof-of-concept using stack():

pivot_longer = 
  function(
    data,
    cols,
    names_to = "name",
    names_prefix = NULL,
    # names_sep = NULL,
    # names_pattern = NULL,
    # names_ptypes = list(),
    # names_transform = list(),
    # names_repair = "check_unique",
    values_to = "value",
    # values_drop_na = FALSE,
    # values_ptypes = list(),
    # values_transform = list(),
    ...
  ){
    cols = deparse(substitute(cols))
    if (grepl("^!|^-", cols)) {
      cols = setdiff(names(data), gsub("^!|^-", "", cols))
    }
    ocols = setdiff(names(data), cols)
    stacked_data = stack(data[, cols])[, 2:1]
    ret = cbind(data[, ocols], setNames(stacked_data, c(names_to, values_to)))
    if (!is.null(names_prefix)) {
      ret[[names_to]] = gsub(paste0("^", names_prefix), "", ret[[names_to]])
    }
    ret = ret[order(ret[, ocols[1]]), ]
    rownames(ret) = 1:nrow(ret)
    ret
  }

#
## First example from ?tidy::pivot_longer
#

data('relig_income', package = 'tidyr')

## This version
relig_income |>
  pivot_longer(!religion, names_to = "income", values_to = "count") |>
  head(10)
#>    religion             income count
#> 1  Agnostic              <$10k    27
#> 2  Agnostic            $10-20k    34
#> 3  Agnostic            $20-30k    60
#> 4  Agnostic            $30-40k    81
#> 5  Agnostic            $40-50k    76
#> 6  Agnostic            $50-75k   137
#> 7  Agnostic           $75-100k   122
#> 8  Agnostic          $100-150k   109
#> 9  Agnostic              >150k    84
#> 10 Agnostic Don't know/refused    96

## tidyr version
relig_income |>
  tidyr::pivot_longer(!religion, names_to = "income", values_to = "count") |>
  data.frame() |> head(10)
#>    religion             income count
#> 1  Agnostic              <$10k    27
#> 2  Agnostic            $10-20k    34
#> 3  Agnostic            $20-30k    60
#> 4  Agnostic            $30-40k    81
#> 5  Agnostic            $40-50k    76
#> 6  Agnostic            $50-75k   137
#> 7  Agnostic           $75-100k   122
#> 8  Agnostic          $100-150k   109
#> 9  Agnostic              >150k    84
#> 10 Agnostic Don't know/refused    96

#
## Contrived names_prefix useage
#

## This version
relig_income[, c(1, 3:9)] |>
  pivot_longer(!religion, names_to = "income", values_to = "count", names_prefix = "\\$") |>
  head(10)
#>    religion   income count
#> 1  Agnostic   10-20k    34
#> 2  Agnostic   20-30k    60
#> 3  Agnostic   30-40k    81
#> 4  Agnostic   40-50k    76
#> 5  Agnostic   50-75k   137
#> 6  Agnostic  75-100k   122
#> 7  Agnostic 100-150k   109
#> 8   Atheist   10-20k    27
#> 9   Atheist   20-30k    37
#> 10  Atheist   30-40k    52

## tidyr version
relig_income[, c(1, 3:9)] |>
  tidyr::pivot_longer(!religion, names_to = "income", values_to = "count", names_prefix = "\\$") |>
  data.frame() |> head(10)
#>    religion   income count
#> 1  Agnostic   10-20k    34
#> 2  Agnostic   20-30k    60
#> 3  Agnostic   30-40k    81
#> 4  Agnostic   40-50k    76
#> 5  Agnostic   50-75k   137
#> 6  Agnostic  75-100k   122
#> 7  Agnostic 100-150k   109
#> 8   Atheist   10-20k    27
#> 9   Atheist   20-30k    37
#> 10  Atheist   30-40k    52

Created on 2021-11-14 by the reprex package (v2.0.1)

Two asides:

  1. I have a strong suspicion that going with stack/unstack is ultimately going to be easier than reshape. The latter is (a) very particular about its input form and arguments, and (b) you'll probably end up having to do some binds and internal manipulation anyway once you start invoking the additional pivot_* arguments.
  2. The main complication for extending this proof of concept is handling the NSE. But I believe(?) you've already developed an internal system for deparsing NSE vectors etc. So, that could make things a lot easier.

EDIT: Added names_prefix arg and example.