musoke / UltraDark.jl

Simulations of cosmological scalar fields
https://musoke.github.io/UltraDark.jl/
MIT License
15 stars 3 forks source link

Bad scaling with number of threads #22

Closed musoke closed 3 years ago

musoke commented 3 years ago

Currently the threading doesn't scale well. Using more than 4 cores seems to have worse performance.

This is plausible for small boxes, but not so much for higher resolutions. The transition seems to happen at the same number of threads for all resolutions.

bench

Data from mahuika cluster:

threads,resol,time
1, 64, 0.1247771599
2, 64, 0.1163449537
3, 64, 0.1124151591
4, 64, 0.1101193197
5, 64, 0.1183714363
6, 64, 0.12240156399999999
7, 64, 0.124348898
8, 64, 0.12303142019999999
1, 128, 0.3606674845
2, 128, 0.24102483170000003
3, 128, 0.20972236510000003
4, 128, 0.1858509835
5, 128, 0.2015632369
6, 128, 0.1872356773
7, 128, 0.18889181289999998
8, 128, 0.1844531894
1, 256, 2.4285605807
2, 256, 1.3929138335
3, 256, 1.0665080768
4, 256, 0.8696183325
5, 256, 0.9901497398
6, 256, 0.9156979212999999
7, 256, 0.8447299337
8, 256, 0.7809342508
musoke commented 3 years ago

Seems to be fixed. The problem was how the benchmarking script tried to limit the number of cores.

bench

musoke commented 3 years ago
threads,resol,time
3, 64, 0.0160400468
1, 64, 0.024220941200000002
2, 64, 0.022879606
4, 64, 0.0151645835
9, 64, 0.0130841397
15, 64, 0.0111881608
10, 64, 0.0124571318
14, 64, 0.0157035068
11, 64, 0.013343881499999998
7, 64, 0.015176574100000001
12, 64, 0.0132816245
13, 64, 0.0172383189
6, 64, 0.0145660495
16, 64, 0.0371270104
8, 64, 0.013778596600000002
5, 64, 0.0162725617
3, 128, 0.10362121310000001
15, 128, 0.040638026199999996
10, 128, 0.0544472717
9, 128, 0.057905034099999995
11, 128, 0.0489576433
14, 128, 0.0470301827
1, 128, 0.2634013306
4, 128, 0.12295886110000001
12, 128, 0.050868758300000004
13, 128, 0.0577783912
7, 128, 0.0795360551
2, 128, 0.224076728
8, 128, 0.0741212386
6, 128, 0.09105579529999999
16, 128, 0.0689855095
5, 128, 0.10189651660000001
15, 256, 0.4331156058
9, 256, 0.5527172316
11, 256, 0.5197799338
10, 256, 0.5774306459
14, 256, 0.4677659839
12, 256, 0.4850991414
3, 256, 1.0966876536999999
13, 256, 0.524659854
16, 256, 0.41684268510000005
8, 256, 0.7013439027999999
7, 256, 0.7528811089999999
4, 256, 1.1267780603
6, 256, 0.7656996806999999
5, 256, 0.9188899689
1, 256, 2.3843180534
2, 256, 2.0050800943