Closed awelzel closed 5 months ago
Thanks for this issue @awelzel, this is a useful microbenchmarking result. We use std::shared_ptr
extensively in the runtime library, e.g., as control blocks in views and our safe iterators which we copy heavily; we definitely want to avoid unneeded overhead there.
I ran your reproducer with Clang and libc++ and I observe no significant overhead, and we do not want overhead for GCC and libstdc++ either.
Unassigning myself since ATM there is no clear path forward which doesn't involve a lot of work like implementing a standard libary-quality smart pointer library.
I don't see us changing the smart pointer any time soon, so closing because of the lack of a way forward.
This is again a micro-benchmark, but I think an interesting observation. Relates to zeek/zeek#3379.
When running
spicy-driver
for micro-benchmarking, glibc is running in single-threaded mode, avoiding usage of atomic instructions forstd::shared_ptr
. Within multi-threaded applications like Zeek,std::shared_ptr
usage is more expensive. Patchingspicy-driver.cc
to start a very short-lived thread, thereby switching glibc into multi-threaded mode, the attached micro-benchmark runs 6% slower on my system due to what seems just use of atomic instructions forstd::shared_ptr
.Recording the spicy-driver run with
perf record --call-graph dwarf
, the__gnu_cxx::__exchange_and_add
function is reported with ~5.8% samples as hottest function. In a zeek -r test with the QUIC analyzer, it shows up with ~3% samples.Not quite sure there's something that can be fixed unless removal of std::shared_ptr is on the table, but opening mostly as FYI.
(Testing was done with code from #1590)
Patch to spicy-driver.cc: