mhahsler / seriation

Infrastructure for Ordering using Seriation - R Package
GNU General Public License v3.0
75 stars 17 forks source link

Feature request: seriate a frequency matrix using correspondence analysis #17

Closed friendly closed 2 years ago

friendly commented 2 years ago

With mosaic displays, I often want to permute the rows/columns of a frequency matrix to show the pattern of association, a task that is easily done using correspondence analysis, permuting by the scores on the first CA dimension.

This is analogous to your PCA method, but CA uses an SVD of the matrix of residuals from an independence model. It would be a useful and welcome addition to seriate.

Before I remembered the seriate package, I cobbled up a quick function reorder.matrix(),

#' Reorder a matrix according to a correspondence analysis dimension
#' 
#' This function is designed to help simplify a mosaic plot or other displays of a matrix of
#' frequencies.  It calculates a correspondence analysis of the matrix and reorders rows or columns
#' or both according to the scores on a correspondence analysis dimension.
#'
#' @param mat    A numeric matrix
#' @method reorder matrix
#' @param which  Which dimension(s) to reorder? Either \code{1} (rows) or \code{2} (columns) or \code{1:2} (both)
#' @param dim    Correspondence analysis dimension to reorder upon
#' @param ...    Other options, passed to \code{ca()}.
#'
#' @return       The matrix, with its rows and or columns reordered
#' @author Michael Friendly
#' @export
#'
#' @examples
#' data(HairEyeColor)
#' HairEye <- margin.table(HairEyeColor, 2:1)
#' HairEye
#' # Reordering by eye color gives a nicer mosaic display
#' reorder.matrix(HairEye, 1)

reorder.matrix <- function(x, which, dim=1, ...) {
  library(ca)
  mat.ca <- ca::ca(x)
  rord <- 1:nrow(x)
  cord <- 1:ncol(x)
  if (1 %in% which) {
    rcoord <- mat.ca$rowcoord    # row coordinates
    rord <- order(rcoord[, dim]) 
  }
  if (2 %in% which) {
    ccoord <- mat.ca$colcoord    # col coordinates
    cord <- order(ccoord[, dim])
  }
  x <- x[rord, cord]
  x
}

I took a look at the code in seriate, and what I'm proposing is similar in intent to seriate_PCA.R, but there is too much infrastructure there for me to know how to write an analogous seriate_CA.R

Ideally, the method I'd like would be applicable to matrix, table and data.frame objects.

Can someone help?
If you don't want to put it in the seriate package, I could include it in vcdExtra, if only I knew how to write it as a method to add to the seriate generic.

mhahsler commented 2 years ago

Hi Michael!

Thanks for the code. I have incorporated your code into seriation on GitHub. Please check if this produces what you were expecting.

library(seriation)

data(HairEyeColor)
HairEye <- margin.table(HairEyeColor, 2:1)
HairEye
s <- seriate(HairEye, method = "CA")
s

permute(HairEye, s)

Here is the method description with parameters:

> get_seriation_method("matrix", "CA")
name:        CA
kind:        matrix
description: This method calculates a correspondence analysis of the matrix and computes an order according to the scores on a correspondence analysis dimension.
control (default values):
  dim ca_param
1   1     NULL
friendly commented 2 years ago

Thanks for this quick reply. This is a welcome addition to seriation. 'll check it out more thoroughly from the dev version.

friendly commented 2 years ago

Thanks very much for incorporating a CA method in the seriation package.

I tried using seriation for two sample problems, illustrating how it helps the interpretability of mosaic displays. Here is a document of my attempts and results: https://rpubs.com/friendly/test-seriation

I found that permute() is more difficult to use than I'd like b/c it can't handle dim=1:2 to permute both rows and columns in a single call, which is what I'd like.

My summary: seriation::seriate() has the infrastructure for a wide range of seriation tasks, but seems overly complex for the application to frequency tables using CA with a goal of incorporating in vcd:mosaic().

Is there something simpler than what I have done: seriate -> get_order -> table[o1, o2] Perhaps some wrapper?

mhahsler commented 2 years ago

Thank you for this very helpful feedback. I mostly worked on seriating distance matrices so the current interface for tables is not ideal. I will simplify the process to a single call:

library(seriation)
library(vcd)

data("HairEyeColor")
haireye <- margin.table(HairEyeColor, 1:2)

hec_perm <- permute(haireye, "CA", margin = 1:2)

mosaic(hec_perm, shade=TRUE, legend=FALSE)

I will write some code and tests and let you know when it is ready. -Michael

mhahsler commented 2 years ago

A first implementation is now in the development version on GitHub. Your examples with the one-line permutation using CA using the development version of seriation can be seen here: https://rpubs.com/mhahsler/seriation_CA

I have also fixed margin in permute() so it accepts a vector with dimensions. Additional parameters for the seriation method can be passed on as ....

If that interface change is useful, then I will add some tests and prepare a CRAN release.

friendly commented 2 years ago

Thanks so much for this, Michael. That change is exactly what I was looking for. I've updated my Rpubs example with these tests. https://rpubs.com/friendly/test-seriation

mhahsler commented 2 years ago

About your question: There are cases where one might want to permute the rows/cols according to the CA 2nd dimension. Is this possible?

... in permute() is passed on to seriate() which adds ... to it's options.

Here are the parameters for "CA":

get_seriation_method("matrix", "CA")
name:        CA
kind:        matrix
description: This method calculates a correspondence analysis of the matrix and computes an order according to the scores on a correspondence analysis dimension.
control (default values):
  dim ca_param
1   1     NULL

So you can use the second CA dimension using:

permute(haireye, "CA", dim = 2)

ca_param is a list with additional parameters for creating the ca.

friendly commented 2 years ago

Thanks for this. I didn't see that from the documentation, but that's probably my fault.

mhahsler commented 2 years ago

No it was not there. I will update the man page.

mhahsler commented 2 years ago

A new version of seriation (1.4.0) is on its way to CRAN. Thank you for your help!

friendly commented 2 years ago

that is wonderful! Thanks for being so responsive on this issue. We're now considering whether / how to add this as an option in vcd::mosaic()