saezlab / MetaProViz

R-package to perform metabolomics pre-processing, differential metabolite analysis, metabolite clustering and custom visualisations.
https://saezlab.github.io/MetaProViz/
GNU General Public License v3.0
8 stars 0 forks source link

DMA --> Mean of a metabolite = 0 #17

Closed ChristinaSchmidt1 closed 1 year ago

ChristinaSchmidt1 commented 1 year ago

Line 83

add +1 count to the metabolites with 0 because otherwise the Log2FC with NAs

    Mean_C1[,which(Mean_C1[1,]==0)] <- Mean_C1[,which(LMean_C1[1,]==0)]+1# was Log2FC_Condition1
    Mean_C2[,which(Mean_C2[1,]==0)] <- Mean_C2[,which(Mean_C2[1,]==0)]+1

I dont think thats a good idea. If this happens, we should probably add the smallest possible numeric value here 0.1E-118. But let me think if thats the best way.

ChristinaSchmidt1 commented 1 year ago

So I have read some other packages guidlines and I decided that for the time being it is the best option to do something similar as DESeq2 did in their guidelines (Note on p-values set to NA: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#pvaluesNA).

In detail I will implement a checking process that will print out messages/warnings:

  1. If Log2FC is NA: 1.1. Check if the input file has 0 values or NAs. If tis is the case give a warning and print the features and conditions this is affecting and suggest to do missing value imputation. 1.2. If this is not because of the input file having 0/NA, check if this occurs when calculating the mean. In this case we could still consider to set the value to 1 and give a message()

FYI: at the moment both in core and intra, we will have NA for the values if one of the two conditions are 0.

ChristinaSchmidt1 commented 1 year ago

Ok I have implemented this. Basically if the user inputs a DF with NA or 0 values, both will be treated the same:

  1. we will calculate the Log2FC based on the mean using the 0 instead of NA --> if all replicates are NA hence 0, Log2FC = Inf, else Log2FC will have a value
  2. We set p.val=NA for metabolites that have NA/0 in one of the replicates and we calculate the adjusted p-value without those values.
  3. If the mean=0, not because all replicates where NA/0, but by coincidence, we add a constant of +1 prior to calculating the Log2FC.

Here are some examples of how this will look like: image