rformassspectrometry / QFeatures

Quantitative features for mass spectrometry data
https://RforMassSpectrometry.github.io/QFeatures/
25 stars 7 forks source link

Improve sweep for multiple assays? #192

Open cvanderaa opened 1 year ago

cvanderaa commented 1 year ago

During a discussion with Sam, we were wondering whether we could improve sweep(). The discussion only applies when sweeping multiple assays. Consider the following example:

data("feat2")
stats <- c(1.2, 2.1, 3, 4.5)
sfeat2 <- sweep(
        feat2, MARGIN = 2, STATS = stats, i = 1:3, 
        name = paste0(names(feat2), "_sweep")
    )

This example divides all columns by arbitrary numbers, but these numbers are the same for all 3 assays. I don't see a reason in practice that a user wants to divide (or any other operation) their column by numbers that are shared across assays. Sam and I are mostly using sweep() to perform some sort of normalization (cf #79) where we want to divide or subtract the columns (or rows) by a column (or row) statistic, eg mean or median. The function does not allow for this.

I see two actions we could take:

  1. We do not allow sweep on multiple assays, but this would limit the use cases for sweep.
  2. The STATS argument should take a list of numeric vectors with as many elements as the length of i. This, however, increases the complexity of the user experience