Open dprymidis opened 1 year ago
Hi, As you probably noticed i am just going trough some issues to close them up and write you were I do not remember the progress. For this one, did you already start anything? I do remember that you also wanted to implemented the zero variance test in the pre-processing; maybe you could let me know what was the last status and if we can use something you already added into another function.
No this was not done, but I did write the zero variance check as a function which could be used whereever its needed. I paste it here:
ZeroVarCheck <- function(Input_data){
#Check metabolite variance
metabolite_var <- as.data.frame(apply(Input_data, 2, function(x) var(x, na.rm = TRUE)) %>% t()) #calculate each metabolites variance
metabolite_zero_var_list <- colnames(metabolite_var)[which(metabolite_var[1,]==0)] #takes the names of metabollites with zero variance and puts them in list
#Print a warning if Zero var metabolites were identified
if(length(colnames(metabolite_var)[which(metabolite_var[1,]==0)]) > 0 ){
message("Metabolites with zero variance have been identified in the data.")
}
#Remove the zero variance metabolites
Input_data_filtered <- Input_data %>% select(-all_of(metabolite_zero_var_list))
#Save resulting table
#write.table(zero_var_metab_export_df, row.names = FALSE, file = paste(Results_folder_Preprocessing_folder,"/Zero_variance_metabolites",".csv",sep = "")) #save zero var metabolite list
return(list("Input_data_filtered"=Input_data_filtered,"ZeroVarMetabolites" = metabolite_zero_var_list) )
}
Ok, thanks :)
To sum this up, you planned to add this check to the pool estimation function prior to calcuclating CV? So basically we would add to MetaProViz::Pool_Estimation:
1 and 2 would basically result in messages/warnings and refer to cases (=metabolites) where we can not calculate CV either because we have zero variance or because we have not-normal distribution.
Where there other cases where you would have added the zero variance function? I recon we probably should do this prior to the shapiro test as this would also be impacted by zero variance (?).
yes, 1-2-3 are correct and I would add the check prior to when we use the variance or sd like CV tests and also PCAs.
Thanks, then I will do the above :)
About PCA: So prior to PCA you would also remove the zero variance metabolites - how is this impacting the compression in PCA?
Do we need to check for normailty also prior to PCA? Cause within PCA, we recommend scaling=TRUE.
PCA simply does not work if you input features with zero variance, you have to remove them prior using it. About normality no need to check before PCA, it shouldnt play a role.
MetaProViz: fetch error if a metabolite column only has 0s! @ Error: Error in prcomp.default(as.matrix(InputData), scale. = as.logical(Scaling)) : cannot rescale a constant/zero column to unit variance
Need to add the zero variance check in Pool estimation because it could cause error in the PCA in certain scenarios