mildpiggy / DEP2

An r package for proteomics data Analysis, developed from DEP.
Other
15 stars 3 forks source link

Imputation is not reproducible, irrespective of set.seed() #6

Closed 1Moe closed 3 months ago

1Moe commented 5 months ago

Hi again,

I noticed that imputation as implemented in DEP2::impute() is not reproducible, despite setting the seed. Can you fix this or provide a solution that I can implement?

This would greatly improve the workflow. E.g. when re-running my analyses and/or plotting at later stages. For example during revision process etc. we would want to always have the same data outputs (intensities, log2fold changes, pvalues etc) generated.

thanks so much again!

set.seed(42)
se_imp <- DEP2::impute(se, fun="QRILC")
se_imp2 <- DEP2::impute(se, fun="QRILC")

df<- get_df_wide(se_imp)==get_df_wide(se_imp2)
mildpiggy commented 5 months ago

@1Moe Hi again. set.seed should be performed before every random step, you can check more details of set.seed. I just give a simple explain. set.seed just like initiate a random value vectors, where each random number taken will select the first one and count forwards. There will be a difference between two consecutive random operations. Therefore, you need to set.seed before each imputation. Here, You should set.seed(42) twice before each impute, just like:

set.seed(42) # the seed for next random function
se_imp <- DEP2::impute(se, fun="QRILC")
se_imp3 <- DEP2::impute(se, fun="QRILC")

set.seed(42) # reset the seed for the second random function
se_imp2 <- DEP2::impute(se, fun="QRILC")
se_imp4 <- DEP2::impute(se, fun="QRILC")

# Use assay function to get the assay values 
table(assay(imputed1) == assay(imputed2))  # The first QRLIC after set.seed, all true
table(assay(imputed3) == assay(imputed4))  # The second QRLIC after set.seed, all true
table(assay(imputed1) == assay(imputed3))  # The first and second QRLIC after set.seed, different