Closed ChristinaSchmidt1 closed 11 months ago
QC plots:
For 1. This means that we dont return anything and nothing is printed excect the messages/warnings
2 and 6. I am using Save_as_Results and Save_as_Plot as we used these for other functions. The user can put them to NULL if they dont want the results. I will also add a folder named Pool_Estimation in the Preprocessing (?) and if any of these parameters is not NULL I will put the results in there.
PreProcessing_res
So for 1. There are 2 things. 1 is to return the "filtered dataset" if the Unstable_featre=TRUE and the second is to maybe return the CV table and the 3 plots(PCA, Hist, Violin). How do we go about it? We return a list with first the dataset and then the plots?
Yeah we could make a list with two lists (Plots and DFs) and then within the list of Plots we have the QC plots and within the list of DFs we have the result DFs.
How can this get higher than 1, if 1 is supposed to be 100%? I think I am missing something.
set.seed(789) random_values <- runif(3, min = 100, max = 2000) mean_value <- mean(random_values) sd_value <- sd(random_values) cv_value <- sd_value / mean_value random_values [1] 1429.7993 277.6479 122.5850 mean_value [1] 610.0107 sd_value [1] 714.1786 cv_value [1] 1.170764
It can get higher than 1 if some value is much different. But now I noticed that it also has to do with the number of samples you have.
So I was just checking on the CV and indeed that can happen when SD is greater than the mean value. In this case the CV will be more than 100% which means that on an average, data points are very distant from the mean. For the threshold I always thought people would use 0.3 = 30%, but in the end its something the user can change. We can check the metabolites that are above 0.3 and check if they truly appear variable across samples. Also, we can check the mean value as the CV may be high at extremely low concentrations and low at large values.
This is done.
However there is still an issue when we run the PCA the plot gets printed and the grid is saved. So in order to get the plot you have to plot(PoolEstimation_res[["Plots"]][[1]]) and just running the PoolEstimation_res[["Plots"]] gives the PCA plot grid and the other 2 plots. This needs fixing.
Also as for preprocessing we do this assign("PreProcessing_res", preprocessing_output_list, envir=.GlobalEnv) I did this here assign("PoolEstimation_res", Pool_Estimation_res_list, envir=.GlobalEnv) Maybe for this we have to make an agreement of when to assign and when to return
I did this here* Yes indeed this check has to be done
Just wanted to let you know that the vignette throws this error:
Error in ggplot(Pool_Estimation_result, aes(CV)) : object 'Pool_Estimation_result' not found
Should be working now.. sorry
no worries, just wanted to note it down. Thanks!
I was just checking the paramters in the Pool_Estimation function. Could you please:
In this way we make it more flexible and the user could pass any column name for Conditions as in the other functions.
Some points that came up during discussions:
About the QC plots: