Closed dprymidis closed 1 year ago
We can return a vector with the metabolites of high variance. So the user can easily remove them :)
I would probably call this: metabolite detection estimation by pool sample dispersion
lets use this name for the function: Pool_Estimation
After todays meeting we concluded to do the following:
Note, What do we do with NAs? NAs shouldnt exist as the pooled samples are used for metabolite identification (for example in compound discoverer). For the calculation of CV and SE we ignore the NAs just in case. Maybe not?
for 2. in the vignette data (as an example) without log 93.41 are normally distriuted and 6.59 are not. When we take the log of that 92.31 are normally distributed and 7.69 are not. In this case, taking the log actually is making the data "worse". However, by doing this we "ensure" the general normality.
Again for 2. SEM = sd/sqrt(sample_size). taking the log or not of the data we get different SEM. It is affected by the sample mean. So we cannot have a standard threshold. By taking the ratio of SEM/mean we get a value not dependent on the sample mean, which means that we can have a standard threshold. I used this.
I added both SEMean and SEMedian. It turned out that SEMedian is a scaler of SEMean so SEMedian does not actually provide something addittional. Its just a little more strict than the mean. Also, it seems that SEMean and CV give "similar results with thresholds of CV=1 and SEM_ratio = 0.1 Again the SEM_ratio is a scaler of CV. So it makes sense. However, the SEM takes into account also the sample size.
Also assignment of 2 "things" in the global enviroment worked for me. This worked:
I also found this for this issue: https://stackoverflow.com/questions/9726705/assign-multiple-objects-to-globalenv-from-within-a-function
This is Done.
Compound Discoverer uses Group-wise coefficient of variation with threshold of 20. I dont know exactly what it does with this yet.
This is done . I kept a personal list of papers I am going through for the measures of dispersion. I will fill you in at some point.
Thank you, thats great! You can also drop some comments/links into the vignette (the standard one).
Add a function to check the dispersion of each metabolite in the pooled samples. Report and possibly remove high variant metabolites.