tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

Common suffixes and prefixes of vectors #55

Open lionel- opened 4 years ago

lionel- commented 4 years ago

For example, a suffix function::

vec_common_suffix <- function(x, y) {
  c(x, y) %<-% vec_cast_common(x, y)

  x_size <- vec_size(x)
  y_size <- vec_size(y)
  n <- min(x_size, y_size)

  if (!n) {
    return(vec_slice(x, int()))
  }

  x <- vec_slice(x, seq2(x_size - n + 1, x_size))
  y <- vec_slice(y, seq2(y_size - n + 1, y_size))

  common <- vec_equal(x, y)
  i <- which(!common)

  if (length(i)) {
    vec_slice(x, seq2(max(i) + 1, n))
  } else {
    x
  }
}

x <- c("foo", "bar", "baz")
y <- c("quux", "foo", "hop", "baz")
vec_common_suffix(x, y)
#> [1] "baz"

x <- c("foo", "bar", "baz")
y <- c("quux", "foo", "bar", "baz")
vec_common_suffix(x, y)
#> [1] "foo" "bar" "baz"

vec_common_suffix(letters, chr())
#> character(0)

vec_common_suffix(
  data.frame(x = 1:3, y = c("foo", "bar", "baz")),
  data.frame(x = 0:3, y = c("foo", "hop", "bar", "baz"))
)
#>   x   y
#> 1 2 bar
#> 2 3 baz

This makes me think a version of seq() that fails instead of returning an empty vector like seq2() would be useful for checking assumptions.

njtierney commented 4 years ago

Looks neat! But why is this a suffix? I would interpret a suffix as something that falls at the end of the word, isn't this rather a common term across a set of things?

lionel- commented 4 years ago

Suffix refers to the last n values of the vector. This is the conventional term, see https://en.wikipedia.org/wiki/Suffix_array. Technically we should say "longest common suffix" or "prefix" (LCS and LCP).