xia-lab / MetaboAnalystR

R package for MetaboAnalyst
Other
307 stars 154 forks source link

Option for class-specific quantile normalization #227

Closed ekopylova closed 1 year ago

ekopylova commented 2 years ago

Hello!

I noticed in your QuantileNormalize function, you commented out the possibility to normalize within class levels (link to code). I was wondering why and if it's possible to add the option? For example, for quantile normalization of gene expression data, class-specific normalization is recommended (https://www.nature.com/articles/s41598-020-72664-6).

Thank you! Jenya

xia-lab commented 1 year ago

The paper you suggested is not very applicable - it is on transcriptomics and it also considers batch effect. Quantile normalization (QN) for batch effect adjustment is different topic. We are talking about feature normalization within one batch/study. Some approaches in transcriptomics may not be appropriate for metabolomics, and require benchmarking studies before we can recommend them confidently. More specific considerations are given below:

Quantile normalization (QN) has a very strong assumption and influence on data distribution. If it is applied to each class separately, it will generate distinct class separations (very clear on PCA). It is of particular concern when dataset is small. We see PCA changes from no separation to clear separation, with many more significant features after this procedure. However, this could be artifacts (i.e. caused by the algorithm).

A general assumption in differential analysis is that most omics features will remain stable ("homeostasis"), and only a small percentage (say, < 20%) will change. In this case, it is reasonable to apply QN globally, which is the typical use case for QN approach. In addition, untargeted metabolomics (~1000s features) are more similar to transcriptomics in terms of feature numbers, as compared to targeted metabolomics data (10s ~ 100s features). They will involve quite different statistical thinking.

In summary, without dedicated benchmarking, we only recommend QN (applied to whole dataset instead of class-level) for untargeted metabolomics