Closed dmgatti closed 6 years ago
I added checks of the sizes of inputs in the c++ code. I guess that's slowing things down much more than intended.
I surrounded those checks with #ifndef NDEBUG
/ #endif
, so I guess I should try defining NDEBUG
and seeing whether things are faster.
Are you seeing an increase in memory usage? I'm not sure what would cause that.
I think that we’re seeing an increase in memory usage that is linear with the number of cores. Are you using fork or socket clusters?
Dan
The code to do the multi-core analyses is in cluster_util.R
. I use mclapply()
on unix/Mac and parLapply()
on Windows (or if you use parallel::makeCluster()
).
In scan1()
, things get split by chromosome and for a particular batch of phenotypes (with a common pattern of missing data). The genotype probabilities for that chromosome will end up being copied, so there should be some additional memory usage but not linear in the number of cores, I wouldn't think.
But I've not studied memory usage yet; leaving optimization to later and just trying to get the right answers. 😉
OK, maybe I was just seeing the memory increases on Windows, where the cluster are socket clusters.
But Matt seemed to see increases in Linux as well. I don’t see why this would happen from your code though. On the plus side, the clusters return NULL when the run out of memory.
Dan
And I agree; correctness first, speed later.
Dan
Hi @dmgatti @mattjvincent,
I'm having trouble reproducing the speed problems. I tried turning off the debug code in C++ (in qtl2scan 0.5-21), but I didn't see any improvement. And so I went back and did some times of 0.5-13, 0.5-20, and 0.5-21, and I'm not seeing the differences that you saw.
Other than that debug code, there's just been one other change that might affect things, and it's this "registration of routines" business that has led to function calls being via some sort of symbols stuff. As you can probably tell, I don't really understand it. It shows up as a mess of stuff in lines 1590–1706 of RcppExports.cpp
.
Here are my timings on linux (with R 3.3.1):
system.time(out <- scan1(apr, pheno_clin[,"weight_11wk", drop=FALSE], k, addcovar=sex, cores=20))
# user system elapsed
# 0.5-13
# 139.525 15.852 14.031
# 139.417 17.772 14.005
# 137.231 16.794 13.992
# 0.5-20
# 128.861 13.641 14.089
# 139.985 14.737 13.399
# 139.902 16.755 13.574
# 0.5-21
# 128.550 13.363 12.896
# 138.275 13.398 13.328
# 138.477 18.006 14.398
system.time(out <- scan1(apr, pheno_clin[,"weight_11wk", drop=FALSE], k, addcovar=sex, cores=1))
# user system elapsed
# 0.5-13
# 126.537 8.741 135.420
# 124.334 8.144 132.492
# 124.183 8.199 132.396
# 0.5-20
# 122.400 9.690 132.140
# 124.105 9.097 133.276
# 124.887 9.577 134.602
# 0.5-21
# 123.307 8.793 132.140
# 122.188 8.206 130.443
# 124.969 8.569 133.616
On Mac desktop (with R 3.4.1):
system.time(out <- scan1(apr, pheno_clin[,"weight_11wk",drop=FALSE], k, addcovar=sex, cores=20))
# user system elapsed
# 0.5-13
# 160.251 53.199 16.046
# 177.294 58.082 16.403
# 177.109 59.113 15.641
# 0.5-20
# 154.924 50.864 15.723
# 178.440 59.821 16.414
# 177.249 59.193 15.632
# 0.5-21
# 165.601 57.707 16.192
# 165.367 55.477 16.389
# 179.288 58.796 15.792
system.time(out <- scan1(apr, pheno_clin[,"weight_11wk",drop=FALSE], k, addcovar=sex, cores=1))
# user system elapsed
# 0.5-13
# 74.793 21.488 96.291
# 75.253 21.089 96.349
# 74.593 21.088 95.689
# 0.5-20
# 75.641 21.750 97.444
# 76.258 21.320 97.600
# 74.729 21.451 96.189
# 0.5-21
# 75.542 21.029 96.621
# 75.299 21.271 96.593
# 75.551 21.321 96.930
Hmmm. OK, we can try to look at it further here. Thanks for looking on your end.
Dan
We're mapping with a dataset containing 478 samples and 64,000 markers. Matt noticed that qtl2scan_0.5-13 is faster than qtl2scan_0.5-20. He's working on the on-line QTL viewer and small changes in speed can poison the user experience.
Version 0.5-13 on R/3.3.2: 1 core = 36 sec.; 4 cores = 55 sec. Version 0.5-20 on R/3.4.1: 1 core = 44 sec.; 4 cores = 70 s.
Matt gave me the runs and they're below. We could run microbenchmark, but the longer times have been consistent.