rformassspectrometry / QFeatures

Quantitative features for mass spectrometry data
https://RforMassSpectrometry.github.io/QFeatures/
24 stars 6 forks source link

New sweep method [was Enhance: allow to `normalize` rows] #79

Open cvanderaa opened 4 years ago

cvanderaa commented 4 years ago

Following the normalization procedure described in Specht et al. 2019 for normalizing single-cell proteomics data, I need to normalize the rows of an assay.

I think Features::normalize could quickly include row normalization abilities by adding a margin argument, hence getting the following usage for the method:

## S4 method for signature 'SummarizedExperiment'
normalize(object, method, margin = 2, ...)

## S4 method for signature 'Features'
normalize(object, i, name = "normAssay", method, margin = 2, ...)

margin ==2 is column-oriented while margin==1 is row-oriented normalization, like in base::apply.

This will also require changes in MsCoreUtils::normalize_matrix.

lgatto commented 4 years ago

I am wondering if a sweep(), as a general function to sweep out array summaries, wouldn't be more appropriate here, and keep normalize for the more standard normalisation methods?

@jorainer @sgibb - what do you think?

jorainer commented 4 years ago

A general sweep method sounds good to me!

lgatto commented 4 years ago

This PR adds a sweep method. There is however one point I would like to highlight. As mentioned in the documentation:

        • ‘"center.mean"’ and ‘"center.median"’ center the respective
          sample (column) intensities by subtracting the respective
          column means or medians. ‘"div.mean"’ and ‘"div.median"’
          divide by the column means or medians. These are equivalent
          to ‘sweep’ing the column means (medians) along ‘MARGIN = 2’
          with ‘FUN = "-"’ (for ‘"center.*"’) or ‘FUN = "/"’ (for
          ‘"div.*"’).

There is now redundancy between some normalisation methods and sweep. We could

  1. leave it as is for backwards compatibility
  2. deprecate some normalisation methods in Features
  3. deprecate some normalisation methods in Features and MsCoreUtils

Any opinions @sgibb @jorainer @cvanderaa

jorainer commented 4 years ago

I'd go for option 3 which seems to me the cleanest. I guess, so far only you used the normalization functions from MsCoreUtils so I don't think it will be problematic. What would help is if you had also an example how normalization with sweep would work (in addition to the documentation above).

lgatto commented 4 years ago

It is the cleanest, but these normalisations actually already exist in MSnbase (and have for a long time), which MsCoreUtils takes them from. So it's not like they haven't been in the wild yet. Some, like "sum" and "max" (not mentioned above) are widely used for some specific cases.