Just wanted to play around and document associative container performance.
Work here is not finished, improvements
Benchmark result:
For the benchmark to traverse elements[1] in a container, I found that std::multimap is roughly 3 times slower for traversing elements than std::unordered_map. boost::container::flat_map is only marginally faster than std::unordered_map.
Footnote [1]: Here by traversal of elements I mean touching every Atom<3> in the associative container (i.e. loading every Atom<3> into registers).
Footnote [2]: types::CellIteratorType<dim> isn't used yet in this benchmark as setting up Trinagulation is an additional task.
Run on (8 X 4000 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
----------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------
BM_traverse_multimap/64 74586 ns 74562 ns 9357
BM_traverse_multimap/512 9757800 ns 9758520 ns 72
BM_traverse_multimap/1024 36649570 ns 36652572 ns 19
BM_traverse_multimap_BigO 32346.68 N 32349.29 N
BM_traverse_multimap_RMS 30 % 30 %
BM_traverse_unordered_map/64 46870 ns 46873 ns 14916
BM_traverse_unordered_map/512 2927165 ns 2927384 ns 239
BM_traverse_unordered_map/1024 11468719 ns 11469591 ns 60
BM_traverse_unordered_map_BigO 10074.17 N 10074.93 N
BM_traverse_unordered_map_RMS 31 % 31 %
BM_traverse_flat_map/64 46007 ns 46011 ns 15204
BM_traverse_flat_map/512 2856056 ns 2856267 ns 238
BM_traverse_flat_map/1024 11325437 ns 11326278 ns 62
BM_traverse_flat_map_BigO 9934.84 N 9935.58 N
BM_traverse_flat_map_RMS 31 % 31 %
benchmarks
library is in spack.Benchmark result: For the benchmark to traverse elements[1] in a container, I found that
std::multimap
is roughly 3 times slower for traversing elements thanstd::unordered_map
.boost::container::flat_map
is only marginally faster thanstd::unordered_map
.Footnote [1]: Here by traversal of elements I mean touching every
Atom<3>
in the associative container (i.e. loading everyAtom<3>
into registers).Footnote [2]:
types::CellIteratorType<dim>
isn't used yet in this benchmark as setting upTrinagulation
is an additional task.