saezlab / MetaProViz

R-package to perform metabolomics pre-processing, differential metabolite analysis, metabolite clustering and custom visualisations.
https://saezlab.github.io/MetaProViz/
GNU General Public License v3.0
8 stars 0 forks source link

MetaProViz::Pool_Estimation - Data normality #63

Open ChristinaSchmidt1 opened 12 months ago

ChristinaSchmidt1 commented 12 months ago

As I was looking into data normality and SD in a different context, I realised that this might be something we need to dicuss in regards to the CV caculation of the pool samples.

Since the CV depends on the SD, we shouldensure that the data is normally distributed and otherwise eisther return a warning, use something else like interquartile range or try to enforce data normality by log transformation (which wouldnt be my favorite choice).

I personally would use the shapiro test on the pool samples. Here we will only have one condition (="Pool") and perform the test for each metabolite. We can return a warning/message about the data distribution as in the DMA function and let the user know the importance of this in regards of CV calculation. We can even consider to add the results into the output DF. Given that this is the same code as in the DMA function I would make the shapiro test into a helper function, so that we can use the helper function in both, DMA and Pool_Estimation.

For the time being, I will add a comment into the vignette, so that the user is informed about the importance of data normality

dprymidis commented 11 months ago

Here we said to make Shapiro test a helper function in preprocessing and add qqplots=T/F and call it in DMA with qqplots=F.

ChristinaSchmidt1 commented 7 months ago

Hi Dimitrios, I am just going trough the open issues and I wanted to check if with the helper function this was completely fixed or if something else needs to be done.

dprymidis commented 7 months ago

Hello! This is partially done. The shapiro is a separate function but its still in the DMA script. The qqplots=T/F is added, but there are still some parameters which need to be adjusted (like the STAT_pval) for using it in the preprocessing vs DMA.

ChristinaSchmidt1 commented 7 months ago

Thanks for the quick response :)

Ok, what I thought initially is just to check data normality with Shapiro in the pool estimation and if it is not normally distributed to flank this metabolite as the interpretation of the CV will be impacted.

I did not plan to produce any plots, but rather add an additional column and the message/warning.

Would you have done something additional or would that be fine (Just checking so I do not miss anything)?

dprymidis commented 7 months ago

No, what you have in mind is correct. I just mentioned what info I have on the matter. The qqplot functionality is there but its not nessesary for preprocessing as it would produce many plots which no one really would check.