Closed billt-hlit closed 5 years ago
The one thing that really strikes me is how much time it is spending in SHA-1 processing in boost::uuids.
That really does stand out. At the moment I'm nonplussed, since I don't expect the registry to even call those functions after it's got going - and indeed, in the first environment I've checked, it doesn't seem to.
Bill, please would you remind me which Boost version you're using?
This build is on stock Ubuntu (rather than the cross-compile we use for nmos-cpp-node), so it's boost 1.65.
Well, despite having things I'm supposed be working on, I investigated this further. It turns out that having the NMOS Explorer running is an important part of the puzzle. I set a breakpoint on the busiest function and traced the problem up to nmos::make_id in id.cpp. That function creates a random number generator every time it runs. Seeding the random number is very expensive. Better to create a long-lived boost::uuids::random_generator (in the caller, probably) and reuse it.
Relevant bits of stack:
#0 boost::uuids::detail::sha1::process_block (this=0x7ffff3e2e2e0, bytes_begin=0x7ffff3e2e350, bytes_end=0x7ffff3e2e364) at /usr/include/boost/uuid/sha1.hpp:121
#1 0x000055555584ce3a in boost::uuids::detail::sha1::process_bytes (this=0x7ffff3e2e2e0, buffer=0x7ffff3e2e350, byte_count=20) at /usr/include/boost/uuid/sha1.hpp:131
#2 0x000055555584d812 in boost::uuids::detail::seed_rng::sha1_random_digest_ (this=0x7ffff3e2e3f0) at /usr/include/boost/uuid/seed_rng.hpp:171
#3 0x000055555584d73a in boost::uuids::detail::seed_rng::operator() (this=0x7ffff3e2e3f0) at /usr/include/boost/uuid/seed_rng.hpp:139
#4 0x000055555584ef86 in boost::uuids::detail::generator_iterator<boost::uuids::detail::seed_rng>::generator_iterator (this=0x7ffff3e2e3d0, g=0x7ffff3e2e3f0) at /usr/include/boost/uuid/seed_rng.hpp:273
#5 0x000055555584e8eb in boost::uuids::detail::seed<boost::random::mersenne_twister_engine<unsigned int, 32ul, 624ul, 397ul, 31ul, 2567483615u, 11ul, 4294967295u, 7ul, 2636928640u, 15ul, 4022730752u, 18ul, 1812433253u> > (rng=...)
at /usr/include/boost/uuid/seed_rng.hpp:302
#6 0x000055555584e227 in boost::uuids::basic_random_generator<boost::random::mersenne_twister_engine<unsigned int, 32ul, 624ul, 397ul, 31ul, 2567483615u, 11ul, 4294967295u, 7ul, 2636928640u, 15ul, 4022730752u, 18ul, 1812433253u> >::basic_random_generator (
this=0x7ffff3e2e4a0) at /usr/include/boost/uuid/random_generator.hpp:46
#7 0x000055555584c887 in nmos::make_id[abi:cxx11]() () at /home/sable/bt/git/nmos-cpp/Development/nmos/id.cpp:14
#8 0x000055555585f02e in nmos::experimental::details::json_from_message (message=..., cursor=...) at /home/sable/bt/git/nmos-cpp/Development/nmos/logging_api.cpp:415
#9 0x0000555555854017 in nmos::experimental::insert_log_event (events=..., message=...) at /home/sable/bt/git/nmos-cpp/Development/nmos/logging_api.cpp:446
#10 0x00005555556e4ffb in (anonymous namespace)::main_gate::service (this=0x7fffffffe150, message=...) at /home/sable/bt/git/nmos-cpp/Development/nmos-cpp-registry/main_gate.h:74
#11 0x00005555556e4eac in (anonymous namespace)::main_gate::service_function::operator() (this=0x555555dcdac0, message=...) at /home/sable/bt/git/nmos-cpp/Development/nmos-cpp-registry/main_gate.h:59
#12 0x00005555556e864c in util::message_service<slog::async_log_message>::run<(anonymous namespace)::main_gate::service_function> (this=0x7fffffffe178, fn=...) at /home/sable/bt/git/nmos-cpp/Development/slog/all_in_one.h:3794
#13 0x00005555556e7f4a in slog::async_log_service<(anonymous namespace)::main_gate::service_function, slog::async_log_message>::<lambda()>::operator()(void) const (__closure=0x555555dcdab8) at /home/sable/bt/git/nmos-cpp/Development/slog/all_in_one.h:3875
Ah, so I think this does depend on platform and Boost version because in all of my environments a different much more performant random generator is used. I'll think about how best to make a static generator.
That diagnosis is very much appreciated, Bill!
Boost.Uuid changed the default random_generator
at Boost 1.67.0. Before that it used to seed a Mersenne Twister PRNG using SHA-1. Expensive.
Therefore, since ids are generated for all log events, add nmos::id_generator
to enable the PRNG to be cached. (The random_generator
is not guaranteed thread-safe so a static instance isn't an option.)
There are no doubt other opportunities for performance improvement, but let's open new issues to cover those when identified.
There are no doubt other opportunities for performance improvement, but let's open new issues to cover those when identified.
That sounds like an invitation. :imp:
gcc whines about id_generator::operator() in your fix:
nmos/id.cpp:36:6: warning: extra ';' [-Wpedantic]
We're open source, so it's definitely an invitation for something. 😈
OK, that helped -- running immediate activations back-to-back on my node now results in the registry "only" consuming about half a CPU. The big winner now is std::__detail::_BracketMatcher<std::__cxx11::regex_traits
I figured regexes might well be next. We use them for route matching and in validation, and there could be some easy wins. Please do open a new issue, thanks, Bill.
Hi, Gareth,
I'm using nmos-cpp-registry to test my changes to nmos-cpp-node and am seeing some excess CPU consumption by the registry. My basic set-up is to run the AMWA nmos-testing python script on one machine, the node on the target platform, and the registry on a third machine running Ubuntu 18.04 that happens to be on a different subnet. There is also an instance of Riedel's NMOS Explorer running on the registry machine.
Due to the huge number of Receivers on the target and the carelessness with which I update the Receivers when something changes, the target is sending large numbers of registry updates. As a result, nmos-cpp-registry ends up consuming a little more than a whole CPU of the machine it's running on, and the registry updates continue long after nmos-testing has finished its tests and rendered its results to the web browser.
I monitored the program's behavior using "perf top -p" which you can see below. The one thing that really strikes me is how much time it is spending in SHA-1 processing in boost::uuids. I don't know what that's for, but maybe there's a caching opportunity there to reduce the UUID computation.