hms-dbmi / scde

R package for analyzing single-cell RNA-seq data
http://pklab.med.harvard.edu/scde
Other
170 stars 64 forks source link

Error in pagoda.pathway.wPCA: results not reproducible, even with seed #49

Open aparna-arr opened 6 years ago

aparna-arr commented 6 years ago

When I run pagoda.pathway.wPCA() multiple times on the same input I get different results, even if I set the seed before each run of this function. How can I ensure this function produces the same results with the same dataset?

R code to test on the tutorial data:

library("scde")
library(org.Hs.eg.db)
data(pollen)

cd <- clean.counts(pollen)
x <- gsub("^Hi_(.*)_.*", "\\1", colnames(cd))
l2cols <- c("coral4", "olivedrab3", "skyblue2", "slateblue3")[as.integer(factor(x, levels = c("NPC", "GW16", "GW21", "GW21+3")))]

data(knn)

varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = TRUE)

varinfo <- pagoda.subtract.aspect(varinfo, colSums(cd[, rownames(knn)]>0))

# translate gene names to ids
ids <- unlist(lapply(mget(rownames(cd), org.Hs.egALIAS2EG, ifnotfound = NA), function(x) x[1]))
rids <- names(ids); names(rids) <- ids 
# convert GO lists from ids to gene names
gos.interest <- unique(c(ls(org.Hs.egGO2ALLEGS)[1:100],"GO:0022008","GO:0048699", "GO:0000280", "GO:0007067")) 
go.env <- lapply(mget(gos.interest, org.Hs.egGO2ALLEGS), function(x) as.character(na.omit(rids[x]))) 
go.env <- clean.gos(go.env) # remove GOs with too few or too many genes
go.env <- list2env(go.env) # convert to an environment

# test without seed

pwpca1 <- pagoda.pathway.wPCA(varinfo, go.env, n.components = 1, n.cores = 35)

pwpca2 <- pagoda.pathway.wPCA(varinfo, go.env, n.components = 1, n.cores = 35)

ae_noseed<-all.equal(pwpca1,pwpca2)

if( isTRUE(ae_noseed)) {
    print("No seed: pwpca1 pwpca2 are equal")
} else {
    print("No seed: pwpca1 pwpca2 are not equal")
}
# test with seed
set.seed(0)
pwpca3 <- pagoda.pathway.wPCA(varinfo, go.env, n.components = 1, n.cores = 35)

set.seed(0)
pwpca4 <- pagoda.pathway.wPCA(varinfo, go.env, n.components = 1, n.cores = 35)

ae_seed<-all.equal(pwpca3,pwpca4)

if( isTRUE(ae_seed) ) {
    print("With seed: pwpca3 pwpca4 are equal")
} else {
    print("With seed: pwpca3 pwpca4 are not equal")
}

Output:

> source("test.R")
[1] "No seed: pwpca1 pwpca2 are not equal"
[1] "With seed: pwpca3 pwpca4 are not equal"

Single example of differences in results:

> summary(pwpca1$"GO:0048699"$z)
       V1       
 Min.   :6.684  
 1st Qu.:6.769  
 Median :6.945  
 Mean   :6.979  
 3rd Qu.:7.174  
 Max.   :7.390  
> summary(pwpca2$"GO:0048699"$z)
       V1       
 Min.   :6.804  
 1st Qu.:6.992  
 Median :7.123  
 Mean   :7.134  
 3rd Qu.:7.218  
 Max.   :7.499  
> summary(pwpca3$"GO:0048699"$z)
       V1       
 Min.   :6.864  
 1st Qu.:6.954  
 Median :7.038  
 Mean   :7.137  
 3rd Qu.:7.258  
 Max.   :7.764  
> summary(pwpca4$"GO:0048699"$z)
       V1       
 Min.   :6.797  
 1st Qu.:6.965  
 Median :7.056  
 Mean   :7.077  
 3rd Qu.:7.113  
 Max.   :7.516  

sessionInfo()

> sessionInfo("scde")
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /home/arrajpur/Install/R3.4.0-install/lib/R/lib/libRblas.so
LAPACK: /home/arrajpur/Install/R3.4.0-install/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
character(0)

other attached packages:
[1] scde_1.99.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11              compiler_3.4.0           
 [3] BiocInstaller_1.26.0      RColorBrewer_1.1-2       
 [5] nloptr_1.0.4              methods_3.4.0            
 [7] Lmoments_1.2-3            utils_3.4.0              
 [9] tools_3.4.0               grDevices_3.4.0          
[11] digest_0.6.12             bit_1.1-12               
[13] lme4_1.1-13               memoise_1.1.0            
[15] tibble_1.3.3              RSQLite_2.0              
[17] nlme_3.1-131              lattice_0.20-35          
[19] mgcv_1.8-17               pkgconfig_2.0.1          
[21] rlang_0.1.1               Matrix_1.2-10            
[23] DBI_0.7                   parallel_3.4.0           
[25] SparseM_1.77              RcppArmadillo_0.7.900.2.0
[27] IRanges_2.10.2            S4Vectors_0.14.3         
[29] graphics_3.4.0            extRemes_2.0-8           
[31] MatrixModels_0.4-1        datasets_3.4.0           
[33] stats_3.4.0               bit64_0.9-7              
[35] locfit_1.5-9.1            stats4_3.4.0             
[37] grid_3.4.0                nnet_7.3-12              
[39] base_3.4.0                Biobase_2.36.2           
[41] flexmix_2.3-13            AnnotationDbi_1.38.1     
[43] distillery_1.0-4          Rook_1.1-1               
[45] BiocParallel_1.10.1       limma_3.32.2             
[47] minqa_1.2.4               org.Hs.eg.db_3.4.1       
[49] blob_1.1.0                car_2.1-4                
[51] edgeR_3.18.1              pcaMethods_1.68.0        
[53] modeltools_0.2-21         MASS_7.3-47              
[55] BiocGenerics_0.22.0       splines_3.4.0            
[57] RMTstat_0.3               pbkrtest_0.4-7           
[59] quantreg_5.33             brew_1.0-6               
[61] KernSmooth_2.23-15        rjson_0.2.15             
[63] Cairo_1.5-9              
>