Slow performance and high memory usage in qtl2scan_0.5-20

dmgatti commented 7 years ago

We're mapping with a dataset containing 478 samples and 64,000 markers. Matt noticed that qtl2scan_0.5-13 is faster than qtl2scan_0.5-20. He's working on the on-line QTL viewer and small changes in speed can poison the user experience.

Version 0.5-13 on R/3.3.2: 1 core = 36 sec.; 4 cores = 55 sec. Version 0.5-20 on R/3.4.1: 1 core = 44 sec.; 4 cores = 70 s.

Matt gave me the runs and they're below. We could run microbenchmark, but the longer times have been consistent.

>sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] pryr_0.1.2         RSQLite_1.1-2      qtl2scan_0.5-13
qtl2plot_0.5-6    
 [5] qtl2geno_0.5-21    qtl2convert_0.5-3  jsonlite_1.3
intermediate_0.2-4
 [9] magrittr_1.5       dplyr_0.5.0        data.table_1.10.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12.1   codetools_0.2-15 digest_0.6.12    assertthat_0.1
 [5] R6_2.2.0         DBI_0.5-1        stringi_1.1.2    tools_3.3.2
 [9] stringr_1.2.0    memoise_1.0.0    tibble_1.2

> num_cores <- 1
> addcovar <- covar[,-1]
> ptm <- proc.time()
> temp <- (scan1(genoprobs=probs,
+                 kinship=Glist,
+                 pheno=expr.mrna[,idx],
+                 addcovar=addcovar,
+                 cores=num_cores,
+                 reml=TRUE))
> elapsed <- proc.time() - ptm
> print(elapsed)
   user  system elapsed
 28.840   6.970  35.854

> num_cores <- 4
> addcovar <- covar[,-1]
> ptm <- proc.time()
> temp <- (scan1(genoprobs=probs,
+                 kinship=Glist,
+                 pheno=expr.mrna[,idx],
+                 addcovar=addcovar,
+                 cores=num_cores,
+                 reml=TRUE))

> elapsed <- proc.time() - ptm
> print(elapsed)
   user  system elapsed
 59.980 106.340  55.608

===========================================================================

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] pryr_0.1.2         RSQLite_2.0        qtl2scan_0.5-20
qtl2plot_0.5-13   
 [5] qtl2geno_0.5-29    qtl2convert_0.5-7  jsonlite_1.5
intermediate_0.2-4
 [9] magrittr_1.5       dplyr_0.7.2        data.table_1.10.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12     bindr_0.1        bit_1.1-12       R6_2.2.2
 [5] rlang_0.1.2      stringr_1.2.0    blob_1.1.0       tools_3.4.1
 [9] parallel_3.4.1   qtl_1.41-6       DBI_0.7          bit64_0.9-7
[13] assertthat_0.2.0 digest_0.6.12    tibble_1.3.3     bindrcpp_0.2
[17] codetools_0.2-15 memoise_1.1.0    glue_1.1.1       stringi_1.1.5
[21] compiler_3.4.1   pkgconfig_2.0.1

> num_cores <- 1
> addcovar <- covar[,-1]
> ptm <- proc.time()
> temp <- (scan1(genoprobs=probs,
+                 kinship=Glist,
+                 pheno=expr.mrna[,idx],
+                 addcovar=addcovar,
+                 cores=num_cores,
+                 reml=TRUE))

> elapsed <- proc.time() - ptm
> print(elapsed)
   user  system elapsed
 38.070   7.410  44.175

> num_cores <- 4
> addcovar <- covar[,-1]
> ptm <- proc.time()
> temp <- (scan1(genoprobs=probs,
+                 kinship=Glist,
+                 pheno=expr.mrna[,idx],
+                 addcovar=addcovar,
+                 cores=num_cores,
+                 reml=TRUE))
> elapsed <- proc.time() - ptm
> print(elapsed)
   user  system elapsed
  5.030   2.470  69.432

kbroman commented 7 years ago

I added checks of the sizes of inputs in the c++ code. I guess that's slowing things down much more than intended.

I surrounded those checks with #ifndef NDEBUG / #endif, so I guess I should try defining NDEBUG and seeing whether things are faster.

kbroman commented 7 years ago

Are you seeing an increase in memory usage? I'm not sure what would cause that.

dmgatti commented 7 years ago

I think that we’re seeing an increase in memory usage that is linear with the number of cores. Are you using fork or socket clusters?

Dan

kbroman commented 7 years ago

The code to do the multi-core analyses is in cluster_util.R. I use mclapply() on unix/Mac and parLapply() on Windows (or if you use parallel::makeCluster()).

In scan1(), things get split by chromosome and for a particular batch of phenotypes (with a common pattern of missing data). The genotype probabilities for that chromosome will end up being copied, so there should be some additional memory usage but not linear in the number of cores, I wouldn't think.

But I've not studied memory usage yet; leaving optimization to later and just trying to get the right answers. 😉

dmgatti commented 7 years ago

OK, maybe I was just seeing the memory increases on Windows, where the cluster are socket clusters.

But Matt seemed to see increases in Linux as well. I don’t see why this would happen from your code though. On the plus side, the clusters return NULL when the run out of memory.

Dan

dmgatti commented 7 years ago

And I agree; correctness first, speed later.

Dan

kbroman commented 7 years ago

Hi @dmgatti @mattjvincent,

I'm having trouble reproducing the speed problems. I tried turning off the debug code in C++ (in qtl2scan 0.5-21), but I didn't see any improvement. And so I went back and did some times of 0.5-13, 0.5-20, and 0.5-21, and I'm not seeing the differences that you saw.

Other than that debug code, there's just been one other change that might affect things, and it's this "registration of routines" business that has led to function calls being via some sort of symbols stuff. As you can probably tell, I don't really understand it. It shows up as a mess of stuff in lines 1590–1706 of RcppExports.cpp.

Here are my timings on linux (with R 3.3.1):

system.time(out <- scan1(apr, pheno_clin[,"weight_11wk", drop=FALSE], k, addcovar=sex, cores=20))
#    user  system elapsed
# 0.5-13
# 139.525  15.852  14.031
# 139.417  17.772  14.005
# 137.231  16.794  13.992
# 0.5-20
# 128.861  13.641  14.089
# 139.985  14.737  13.399
# 139.902  16.755  13.574
# 0.5-21
# 128.550  13.363  12.896
# 138.275  13.398  13.328
# 138.477  18.006  14.398

system.time(out <- scan1(apr, pheno_clin[,"weight_11wk", drop=FALSE], k, addcovar=sex, cores=1))
#    user  system elapsed
# 0.5-13
# 126.537   8.741 135.420
# 124.334   8.144 132.492
# 124.183   8.199 132.396
# 0.5-20
# 122.400   9.690 132.140
# 124.105   9.097 133.276
# 124.887   9.577 134.602
# 0.5-21
# 123.307   8.793 132.140
# 122.188   8.206 130.443
# 124.969   8.569 133.616

On Mac desktop (with R 3.4.1):

system.time(out <- scan1(apr, pheno_clin[,"weight_11wk",drop=FALSE], k, addcovar=sex, cores=20))
#    user  system elapsed
# 0.5-13
# 160.251  53.199  16.046
# 177.294  58.082  16.403
# 177.109  59.113  15.641
# 0.5-20
# 154.924  50.864  15.723
# 178.440  59.821  16.414
# 177.249  59.193  15.632
# 0.5-21
# 165.601  57.707  16.192
# 165.367  55.477  16.389
# 179.288  58.796  15.792

system.time(out <- scan1(apr, pheno_clin[,"weight_11wk",drop=FALSE], k, addcovar=sex, cores=1))
#    user  system elapsed
# 0.5-13
#  74.793  21.488  96.291
#  75.253  21.089  96.349
#  74.593  21.088  95.689
# 0.5-20
#  75.641  21.750  97.444
#  76.258  21.320  97.600
#  74.729  21.451  96.189
# 0.5-21
#  75.542  21.029  96.621
#  75.299  21.271  96.593
#  75.551  21.321  96.930

dmgatti commented 7 years ago

Hmmm. OK, we can try to look at it further here. Thanks for looking on your end.

Dan

rqtl / qtl2scan

Slow performance and high memory usage in qtl2scan_0.5-20 #109