Open lankiszhang opened 4 years ago
Dear Jiajun, as you have discovered although NBLAST is in theory an embarrassingly parallel problem that could scale perfectly across parallel cores, in practice non-CPU resources can be limiting. We normally find that memory is limiting in practice.
In terms of your server spec, you have a lot of cores for the amount of memory. I suspect that you will not want to run more than 10 jobs, but the only way to be sure is to test. Note that running 1 query neuron against the flyem set will be less efficient than running 10 neurons; the memory usage is likely to be similar for both.
A few additional questions. Is that 2.3 GB the on disk or in memory footprint of your dotprops
object? In either case it is bigger than ours, which either means that you have processed more neurons (which is good) or that you have more information in the dotprops objects than necessary. This will be a double hit against you since the searches will take more CPU (because there are more points) and you will be able to run fewer parallel jobs (because they take more memory). I would suggest running some basic stats on your dotprops object as I have done below and pasting in your results.
> load("fib.twigs5.dps.rda")
> object.size(fib.twigs5.dps)
1594104384 bytes
> length(fib.twigs5.dps)
[1] 27411
> stem(nvertices(fib.twigs5.dps))
The decimal point is 3 digit(s) to the right of the |
0 | 00000000000000000000000000000000000000000000000000000000000000000000+24861
2 | 00000000000000000000000000000000000000000000000000000000000000000000+1810
4 | 00000000000000000000000000000000000111111111111111111111111111122222+289
6 | 00000000001111111111222222222333333344445555555556677778888890000000+27
8 | 111122234444444455566677777788899900233333566788999
10 | 1222223466677779901688
12 | 0123445599935
14 | 112490234
16 | 3938
18 | 24
20 | 28
22 |
24 |
26 |
28 |
30 |
32 |
34 |
36 |
38 | 3
> mean(nvertices(fib.twigs5.dps))
[1] 936.0071
Perhaps you could also add the output of to give a slightly more fine-grained view on this.
dput(hist(nvertices(fib.twigs5.dps), plot=F, breaks=100))
structure(list(breaks = c(0, 500, 1000, 1500, 2000, 2500, 3000,
3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500,
9000, 9500, 10000, 10500, 11000, 11500, 12000, 12500, 13000,
13500, 14000, 14500, 15000, 15500, 16000, 16500, 17000, 17500,
18000, 18500, 19000, 19500, 20000, 20500, 21000, 21500, 22000,
22500, 23000, 23500, 24000, 24500, 25000, 25500, 26000, 26500,
27000, 27500, 28000, 28500, 29000, 29500, 30000, 30500, 31000,
31500, 32000, 32500, 33000, 33500, 34000, 34500, 35000, 35500,
36000, 36500, 37000, 37500, 38000, 38500), counts = c(9967L,
8933L, 4324L, 1825L, 846L, 459L, 299L, 198L, 139L, 94L, 70L,
50L, 43L, 18L, 30L, 12L, 18L, 17L, 7L, 9L, 8L, 9L, 2L, 3L, 8L,
3L, 2L, 0L, 4L, 2L, 3L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 2L, 0L, 0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 1L), density = c(0.00072722629601255, 0.000651782131261173,
0.000315493779869395, 0.000133158221152092, 6.17270438874904e-05,
3.34902046623618e-05, 2.1816059246288e-05, 1.44467549523914e-05,
1.01419138302141e-05, 6.85856043194338e-06, 5.1074386195323e-06,
3.64817044252307e-06, 3.13742658056984e-06, 1.31334135930831e-06,
2.18890226551384e-06, 8.75560906205538e-07, 1.31334135930831e-06,
1.24037795045785e-06, 5.1074386195323e-07, 6.56670679654153e-07,
5.83707270803692e-07, 6.56670679654153e-07, 1.45926817700923e-07,
2.18890226551384e-07, 5.83707270803692e-07, 2.18890226551384e-07,
1.45926817700923e-07, 0, 2.91853635401846e-07, 1.45926817700923e-07,
2.18890226551384e-07, 0, 7.29634088504615e-08, 7.29634088504615e-08,
7.29634088504615e-08, 7.29634088504615e-08, 0, 0, 1.45926817700923e-07,
0, 0, 0, 7.29634088504615e-08, 7.29634088504615e-08, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 7.29634088504615e-08), mids = c(250,
750, 1250, 1750, 2250, 2750, 3250, 3750, 4250, 4750, 5250, 5750,
6250, 6750, 7250, 7750, 8250, 8750, 9250, 9750, 10250, 10750,
11250, 11750, 12250, 12750, 13250, 13750, 14250, 14750, 15250,
15750, 16250, 16750, 17250, 17750, 18250, 18750, 19250, 19750,
20250, 20750, 21250, 21750, 22250, 22750, 23250, 23750, 24250,
24750, 25250, 25750, 26250, 26750, 27250, 27750, 28250, 28750,
29250, 29750, 30250, 30750, 31250, 31750, 32250, 32750, 33250,
33750, 34250, 34750, 35250, 35750, 36250, 36750, 37250, 37750,
38250), xname = "nvertices(fib.twigs5.dps)", equidist = TRUE), class = "histogram")
Finally it would be worth checking a few specific neurons (here identified by bodyid):
> n20=sample(names(fib.twigs5.dps), 20)
> nvertices(fib.twigs5.dps)[n20]
794225197 1288172070 974456295 886643729 1170749805 297580512 1903866591
1942 1148 1007 922 195 903 1084
796401179 2073408606 642720190 5813039999 600011602 1890237470 5813000419
2071 70 756 648 680 13 914
1007256095 639875892 1010338583 1471778506 796940742 541127846
940 1479 676 590 434 3786
You can do this by:
n20=c("794225197", "1288172070", "974456295", "886643729", "1170749805",
"297580512", "1903866591", "796401179", "2073408606", "642720190",
"5813039999", "600011602", "1890237470", "5813000419", "1007256095",
"639875892", "1010338583", "1471778506", "796940742", "541127846")
nvertices(dps_flyEM[n20])
Dear Greg,
Thank you for your fast and kindly reply!
I did some test on our server yesterday. It seems that running on four cores was faster than running on either single core or 16 cores.
Thank you for the important hint that my dotprops is larger than necessary. I tried the test code but got an error with nvertices()
> nvertices(dps_JRC2018U) Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent
I looked back on my dps objects, it is a large list consist of 21663 dotprops list. In each dotprops list, there are four lists of "points", "alpha", "vect" and "labels". While "alpha" and "labels" have a type of "double [898 ]", "points" and "vect" have type of "double [898 x 3]".
It seems that I really need to regenerate my dps object. I will try to do it on our ubuntu server with the docker image or natmanager as you told me before. Would you mind giving me some suggestion for the regeneration according to my current problematic dps obejct? Thank you in advance!
Best wishes, Jiajun Zhang
@lankiszhang I'm sorry I missed replying to this at the time. Your dotprops appear to be in a plain list object rather than a neuronlist. If you do as.neuronlist on it, then nvertices should work again.
As for your benchmarks, it is hard to interpret them without knowing how many neurons you are using or how many vertices there are in the dotprops objects. If you are using relatively few neurons then the overhead of forking and setting up the parallel environment will be large compared with the time saving of running in parallel. This is likely why 4 cores already gives you only 2x speed-up over one core but is actually slightly faster than 16. At the other end, as discussed earlier if you have many neurons, then running many cores will cause problems due to memory issues.
Dear all,
We have found nblast really helpful to our current project, especially when doing nblast against the FlyEM database.
On my laptop (6 cores 12 threads), it takes about 4 min for a one against all NBlast when running on single core.
As I want to reduce the time, I used doParallel to define a multi-core backend and run NBlast with .parallel = TRUE. Interestingly, I could confirm that all my 12 cores were running with a 100% RAM consumption, and it ended up with more than 10 min for the same task.
Then I tried running NBlast on only two cores to avoid the high memory consumption, and it took 5 min for the task.
Take the longer time and high memory consumption into consideration, I am a little bit confused about how exactly nblast using .parallel. As I have a 4 processors 40 cores 80 threads CPU and 48 GB RAM, and my dps_flyEM is 2.32 GB, is it the best to run NBlast on only 16 cores rather than 80?
Best wishes, Jiajun Zhang