Closed musoke closed 3 years ago
Seems to be fixed. The problem was how the benchmarking script tried to limit the number of cores.
threads,resol,time
3, 64, 0.0160400468
1, 64, 0.024220941200000002
2, 64, 0.022879606
4, 64, 0.0151645835
9, 64, 0.0130841397
15, 64, 0.0111881608
10, 64, 0.0124571318
14, 64, 0.0157035068
11, 64, 0.013343881499999998
7, 64, 0.015176574100000001
12, 64, 0.0132816245
13, 64, 0.0172383189
6, 64, 0.0145660495
16, 64, 0.0371270104
8, 64, 0.013778596600000002
5, 64, 0.0162725617
3, 128, 0.10362121310000001
15, 128, 0.040638026199999996
10, 128, 0.0544472717
9, 128, 0.057905034099999995
11, 128, 0.0489576433
14, 128, 0.0470301827
1, 128, 0.2634013306
4, 128, 0.12295886110000001
12, 128, 0.050868758300000004
13, 128, 0.0577783912
7, 128, 0.0795360551
2, 128, 0.224076728
8, 128, 0.0741212386
6, 128, 0.09105579529999999
16, 128, 0.0689855095
5, 128, 0.10189651660000001
15, 256, 0.4331156058
9, 256, 0.5527172316
11, 256, 0.5197799338
10, 256, 0.5774306459
14, 256, 0.4677659839
12, 256, 0.4850991414
3, 256, 1.0966876536999999
13, 256, 0.524659854
16, 256, 0.41684268510000005
8, 256, 0.7013439027999999
7, 256, 0.7528811089999999
4, 256, 1.1267780603
6, 256, 0.7656996806999999
5, 256, 0.9188899689
1, 256, 2.3843180534
2, 256, 2.0050800943
Currently the threading doesn't scale well. Using more than 4 cores seems to have worse performance.
This is plausible for small boxes, but not so much for higher resolutions. The transition seems to happen at the same number of threads for all resolutions.
Data from mahuika cluster: