saezlab / MetaProViz

R-package to perform metabolomics pre-processing, differential metabolite analysis, metabolite clustering and custom visualisations.
https://saezlab.github.io/MetaProViz/
GNU General Public License v3.0
11 stars 1 forks source link

Pool Estimation - Zero variance metabolites #75

Open dprymidis opened 1 year ago

dprymidis commented 1 year ago

Need to add the zero variance check in Pool estimation because it could cause error in the PCA in certain scenarios

ChristinaSchmidt1 commented 10 months ago

Hi, As you probably noticed i am just going trough some issues to close them up and write you were I do not remember the progress. For this one, did you already start anything? I do remember that you also wanted to implemented the zero variance test in the pre-processing; maybe you could let me know what was the last status and if we can use something you already added into another function.

dprymidis commented 10 months ago

No this was not done, but I did write the zero variance check as a function which could be used whereever its needed. I paste it here:

ZeroVarCheck <- function(Input_data){

  #Check metabolite variance
  metabolite_var <-  as.data.frame(apply(Input_data, 2, function(x) var(x, na.rm = TRUE)) %>% t()) #calculate each metabolites variance
  metabolite_zero_var_list <- colnames(metabolite_var)[which(metabolite_var[1,]==0)] #takes the names of metabollites with zero variance and puts them in list

  #Print a warning if Zero var metabolites were identified
  if(length(colnames(metabolite_var)[which(metabolite_var[1,]==0)]) > 0 ){
    message("Metabolites with zero variance have been identified in the data.")
  }

  #Remove the zero variance metabolites
  Input_data_filtered <- Input_data %>% select(-all_of(metabolite_zero_var_list))

  #Save resulting table
  #write.table(zero_var_metab_export_df, row.names = FALSE, file =  paste(Results_folder_Preprocessing_folder,"/Zero_variance_metabolites",".csv",sep =  "")) #save zero var metabolite list

  return(list("Input_data_filtered"=Input_data_filtered,"ZeroVarMetabolites" = metabolite_zero_var_list) )
  }
ChristinaSchmidt1 commented 10 months ago

Ok, thanks :)

To sum this up, you planned to add this check to the pool estimation function prior to calcuclating CV? So basically we would add to MetaProViz::Pool_Estimation:

  1. Zero variance check
  2. Shapiro test (see other issue we just wrote about)
  3. calculate CV

1 and 2 would basically result in messages/warnings and refer to cases (=metabolites) where we can not calculate CV either because we have zero variance or because we have not-normal distribution.

Where there other cases where you would have added the zero variance function? I recon we probably should do this prior to the shapiro test as this would also be impacted by zero variance (?).

dprymidis commented 10 months ago

yes, 1-2-3 are correct and I would add the check prior to when we use the variance or sd like CV tests and also PCAs.

ChristinaSchmidt1 commented 10 months ago

Thanks, then I will do the above :)

About PCA: So prior to PCA you would also remove the zero variance metabolites - how is this impacting the compression in PCA?

Do we need to check for normailty also prior to PCA? Cause within PCA, we recommend scaling=TRUE.

dprymidis commented 10 months ago

PCA simply does not work if you input features with zero variance, you have to remove them prior using it. About normality no need to check before PCA, it shouldnt play a role.

ChristinaSchmidt1 commented 4 months ago

MetaProViz: fetch error if a metabolite column only has 0s! @ Error: Error in prcomp.default(as.matrix(InputData), scale. = as.logical(Scaling)) : cannot rescale a constant/zero column to unit variance