saezlab / MetaProViz

R-package to perform metabolomics pre-processing, differential metabolite analysis, metabolite clustering and custom visualisations.
https://saezlab.github.io/MetaProViz/
GNU General Public License v3.0
8 stars 0 forks source link

ORA - metabolite selection: warning() #64

Closed dprymidis closed 11 months ago

dprymidis commented 11 months ago

In DM_ORA, I noticed that we rank metabolites based on t.value and then take the top and bottom 10% together and an input for ORA. Is this correct?

Why do we take 10%? is there a specific reason? Wouldnt it be better is we filtered for significance of padj and log2FC ?

Also, shouldnt we separate the upregulated from the downregulated metabolites and run ORA on the 2 subsets separately?

ChristinaSchmidt1 commented 11 months ago

Yes, this is correct and is intended in this way. Some explanation here:

  1. It is important to mention that for any pathway analysis, it is always preferred to use the t-val as this includes both statistical as well as sample information, so in a way is a combination of the Log2FC and p.val. So also for GSEA it is the better way to rank using the t-val.
  2. 10% is not a specific cut-off and one can also choose differently. With changing the parameter PercentageCutoff, you can also choose different cut-off here. I put 10% as a default since this was also done in other packages that perform standard ORA.
  3. In this case we are not separating up-and downregulated metabolites as we are only interested in getting abroad overview: which pathaways are most disregulated. Making groups of up- and down-regulated metabolites is valid too, yet this is already dvelving into speciifc biological questions by deciding on which metabolites you group (e.g. Up or DOWN, ...). If this is indeed what a user is interested in they can just add another column to the DMA results specificing their groups and run the other ORA MC_ORA, where ORA is done for each group.

Does this makes sense?

ChristinaSchmidt1 commented 11 months ago

Maybe we could add an example of choosing UP- and Down regulated clusters and running the ORA on those too in the vignette

dprymidis commented 11 months ago

for 1 and 3 ok it makes sence. For 2, I am a bit concerened because taking the 10% or any other percentage seems a bit arbitrary. Like it will always result in something and people might think they got results even though they dont, just because they see some plots. You know what I mean? While filtering with padj and logFC will not do this .

ChristinaSchmidt1 commented 11 months ago

Thats a valid point. In most cases this should be significant changes as otherwise the comparison has no differences. What we can do is that whatever % choosen, we give a warning if features falling into this are not significantly and/or have Log2FC <0.05.

ChristinaSchmidt1 commented 11 months ago
ChristinaSchmidt1 commented 11 months ago

All done. I will move the last point into an lollipop graph issue we can work on in the furture.