ralna / spral

Sparse Parallel Robust Algorithms Library
https://ralna.github.io/spral/
Other
104 stars 27 forks source link

Avoid default-init overhead for lookup map in assemble_pre #122

Closed mjacobse closed 1 year ago

mjacobse commented 1 year ago

With e3b35456, std::vector started to take care of the memory for the lookup map in assemble_pre to avoid having to call allocate and deallocate manually. But std::vector also default initializes all values on construction, meaning that this change introduced an overhead of always zeroing the map values initially.

Indeed it does not need to be zero'd, as all used values are set upon building it. As it needs to be of size n to be able to map any row index, zeroing it can actually be somewhat costly, especially since it is repeated for each assemble_pre. This change replaces the std::vector with std::unique_ptr to avoid the repeated unnecessary zeroing but still keep the implementation RAII.

Here and in the plots below are some benchmarks with the same setup as in #119, except that repetitions for a matrix are run at different times and on random cores to try to highlight the baseline noise. For most test matrices I see no measureable change, but for a few larger matrices (especially the DIMACS10 group) there are small but significant performance improvements.

Note that e3b35456 only changed assemple_pre, so this only deals with that function too. assemble_post does not use an RAII solution yet. If desired it should be possible to introduce std::unique_ptr for it as well, supposedly without any influence on performance.

benchmark_comparison_all benchmark_comparison_DIMACS10

mjacobse commented 1 year ago

Hm, I am unable to reproduce the pipeline failure for gcc-9, any way to get more information? Edit: Oh, it's the unrelated ssmfe_ciface_test test that fails. Is this common to happen from time to time?

jfowkes commented 1 year ago

@mjacobse I’ve tried a re-run, sometimes the VM hardware is a bit flaky… (and SSMFE is not the most reliable).