Open cvanderaa opened 4 years ago
I am wondering if a sweep()
, as a general function to sweep out array summaries, wouldn't be more appropriate here, and keep normalize
for the more standard normalisation methods?
@jorainer @sgibb - what do you think?
A general sweep
method sounds good to me!
This PR adds a sweep
method. There is however one point I would like to highlight. As mentioned in the documentation:
• ‘"center.mean"’ and ‘"center.median"’ center the respective
sample (column) intensities by subtracting the respective
column means or medians. ‘"div.mean"’ and ‘"div.median"’
divide by the column means or medians. These are equivalent
to ‘sweep’ing the column means (medians) along ‘MARGIN = 2’
with ‘FUN = "-"’ (for ‘"center.*"’) or ‘FUN = "/"’ (for
‘"div.*"’).
There is now redundancy between some normalisation methods and sweep
. We could
Features
Features
and MsCoreUtils
Any opinions @sgibb @jorainer @cvanderaa
I'd go for option 3 which seems to me the cleanest. I guess, so far only you used the normalization functions from MsCoreUtils
so I don't think it will be problematic. What would help is if you had also an example how normalization with sweep
would work (in addition to the documentation above).
It is the cleanest, but these normalisations actually already exist in MSnbase
(and have for a long time), which MsCoreUtils
takes them from. So it's not like they haven't been in the wild yet. Some, like "sum"
and "max"
(not mentioned above) are widely used for some specific cases.
Following the normalization procedure described in Specht et al. 2019 for normalizing single-cell proteomics data, I need to normalize the rows of an assay.
I think
Features::normalize
could quickly include row normalization abilities by adding amargin
argument, hence getting the following usage for the method:margin ==2
is column-oriented whilemargin==1
is row-oriented normalization, like inbase::apply
.This will also require changes in
MsCoreUtils::normalize_matrix
.