mhahsler / seriation

Infrastructure for Ordering using Seriation - R Package
GNU General Public License v3.0
75 stars 17 forks source link

new function: seriate.dataframe #4

Closed slowkow closed 6 years ago

slowkow commented 6 years ago

Hi Michael,

First, thanks for this excellent package! I use it a lot and it works well. Thanks!

I just wanted to suggest that users might enjoy having a new function like seriate.dataframe() for simultaneously seriating 2 factors in a dataframe.

I work with genomics, so I often have data in the "long" format instead of a matrix format. Below, I wrote my own function that converts the long format to a matrix and then calls the seriate.matrix() function on that. Would you consider adding such a function to the seriation package?

Here is a notebook that demonstrates the new function:

https://gist.github.com/slowkow/e0f86b6944db58019a4573e96eb04f59

What do you think?

Here's a first draft of the function copied from the notebook:

# Return a dataframe where 'col1' and 'col2' are factors with levels in order.
seriate_dataframe <- function(
  d, col1, col2, value.var = "percent", fun.aggregate = NULL, method = "BEA_TSP"
) {
  mat <- as.data.frame(data.table::dcast(
    data          = d,
    formula       = as.formula(sprintf("%s ~ %s", col1, col2)),
    value.var     = value.var,
    fun.aggregate = fun.aggregate
  ))
  rownames(mat) <- mat[[1]]
  mat[[1]] <- NULL
  mat <- as.matrix(mat)
  mat[is.na(mat)] <- 0
  mat_order <- seriation::seriate(mat, method = method)
  d[[col1]] <- factor(as.character(d[[col1]]), rownames(mat)[mat_order[[1]]])
  d[[col2]] <- factor(as.character(d[[col2]]), colnames(mat)[mat_order[[2]]])
  return(d)
}
mhahsler commented 6 years ago

Hi, there are just too many ways to convert triple matrix formats into a matrix (see: https://stackoverflow.com/questions/9617348/reshape-three-column-data-frame-to-matrix-long-to-wide-format). So I am reluctant to add a specific one of them to the seriation package.