sony / nmos-cpp

An NMOS (Networked Media Open Specifications) Registry and Node in C++ (IS-04, IS-05)
Apache License 2.0
141 stars 80 forks source link

nmos-cpp-registry performance problems? #46

Closed billt-hlit closed 5 years ago

billt-hlit commented 5 years ago

Hi, Gareth,

I'm using nmos-cpp-registry to test my changes to nmos-cpp-node and am seeing some excess CPU consumption by the registry. My basic set-up is to run the AMWA nmos-testing python script on one machine, the node on the target platform, and the registry on a third machine running Ubuntu 18.04 that happens to be on a different subnet. There is also an instance of Riedel's NMOS Explorer running on the registry machine.

Due to the huge number of Receivers on the target and the carelessness with which I update the Receivers when something changes, the target is sending large numbers of registry updates. As a result, nmos-cpp-registry ends up consuming a little more than a whole CPU of the machine it's running on, and the registry updates continue long after nmos-testing has finished its tests and rendered its results to the web browser.

I monitored the program's behavior using "perf top -p" which you can see below. The one thing that really strikes me is how much time it is spending in SHA-1 processing in boost::uuids. I don't know what that's for, but maybe there's a caching opportunity there to reduce the UUID computation.

   5.05%  nmos-cpp-registry          [.] _ZN5boost5uuids6detail4sha113process_blockEv
   3.63%  nmos-cpp-registry          [.] _ZZNKSt8__detail15_BracketMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0EE8_M_applyEcSt17integral_constantIbLb0EEENKUlvE_clEv
   2.83%  libc-2.27.so               [.] cfree@GLIBC_2.2.5
   2.63%  libstdc++.so.6.0.25        [.] __dynamic_cast
   2.26%  libstdc++.so.6.0.25        [.] _ZNK10__cxxabiv121__vmi_class_type_info12__do_dyncastElNS_17__class_type_info10__sub_kindEPKS1_PKvS4_S6_RNS1_16__dyncast_resultE
   2.07%  nmos-cpp-registry          [.] _ZN5boost5uuids6detail11left_rotateEjm
   1.67%  libstdc++.so.6.0.25        [.] _ZNKSt7__cxx117collateIcE12do_transformEPKcS3_
   1.66%  libc-2.27.so               [.] malloc
   1.24%  libc-2.27.so               [.] __GI___strcmp_ssse3
   1.23%  nmos-cpp-registry          [.] _ZN5boost5uuids6detail4sha112process_byteEh
   1.15%  nmos-cpp-registry          [.] _ZN5boost5uuids6detail4sha117process_byte_implEh
   1.13%  nmos-cpp-registry          [.] _ZNSt6vectorIcSaIcEE4dataEv
   1.07%  [kernel]                   [k] thread_group_cputime
   1.00%  nmos-cpp-registry          [.] _ZSt8distanceIPKcENSt15iterator_traitsIT_E15difference_typeES3_S3_
   0.98%  nmos-cpp-registry          [.] _ZNKSt7__cxx1112regex_traitsIcE17transform_primaryIPKcEENS_12basic_stringIcSt11char_traitsIcESaIcEEET_SA_
   0.91%  nmos-cpp-registry          [.] _ZN9__gnu_cxxneIPKSt4pairIccESt6vectorIS2_SaIS2_EEEEbRKNS_17__normal_iteratorIT_T0_EESD_
   0.88%  nmos-cpp-registry          [.] _ZNKSt7__cxx1112regex_traitsIcE9transformIPcEENS_12basic_stringIcSt11char_traitsIcESaIcEEET_S9_
   0.88%  nmos-cpp-registry          [.] _ZN9__gnu_cxx17__normal_iteratorIPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt6vectorIS6_SaIS6_EEEC2ERKS8_
   0.84%  nmos-cpp-registry          [.] _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S8_St20forward_iterator_tag
   0.78%  libstdc++.so.6.0.25        [.] _ZNK10__cxxabiv120__si_class_type_info12__do_dyncastElNS_17__class_type_info10__sub_kindEPKS1_PKvS4_S6_RNS1_16__dyncast_resultE
   0.78%  nmos-cpp-registry          [.] _ZN9__gnu_cxxmiIPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt6vectorIS6_SaIS6_EEEENS_17__normal_iteratorIT_T0_E15difference_typeERKSF_SI_
   0.76%  nmos-cpp-registry          [.] _ZSt13binary_searchIN9__gnu_cxx17__normal_iteratorIPKcSt6vectorIcSaIcEEEEcEbT_S8_RKT0_
   0.76%  nmos-cpp-registry          [.] _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPcEEvT_S7_St20forward_iterator_tag
   0.73%  nmos-cpp-registry          [.] _ZNSt8__detail15_BracketMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0EE13_M_make_cacheESt17integral_constantIbLb1EE
   0.72%  nmos-cpp-registry          [.] _ZNKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE3endEv
   0.71%  nmos-cpp-registry          [.] _ZNKSt6vectorIcSaIcEE4sizeEv
   0.69%  nmos-cpp-registry          [.] _ZNKSt6vectorIcSaIcEE11_M_data_ptrIcEEPT_S4_
   0.66%  nmos-cpp-registry          [.] _ZNSt6vectorIcSaIcEE19_M_range_initializeIPKcEEvT_S5_St20forward_iterator_tag
   0.63%  nmos-cpp-registry          [.] _ZNSt6bitsetILm256EE9referenceaSEb
   0.63%  nmos-cpp-registry          [.] _ZSt13__lower_boundIN9__gnu_cxx17__normal_iteratorIPKcSt6vectorIcSaIcEEEEcNS0_5__ops14_Iter_less_valEET_SA_SA_RKT0_T1_
   0.63%  libc-2.27.so               [.] __strxfrm_l
   0.60%  nmos-cpp-registry          [.] _ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt6vectorIS7_SaIS7_EEEENS0_5__ops16_Iter_equals_va
   0.59%  nmos-cpp-registry          [.] _ZN9__gnu_cxx17__normal_iteratorIPKcSt6vectorIcSaIcEEEC2ERKS2_
   0.58%  nmos-cpp-registry          [.] _ZN9__gnu_cxx17__normal_iteratorIPKSt4pairIccESt6vectorIS2_SaIS2_EEEC2ERKS4_
   0.58%  nmos-cpp-registry          [.] _ZNSt12_Vector_baseIcSaIcEED2Ev
   0.58%  nmos-cpp-registry          [.] _ZNKSt8__detail15_BracketMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0EE8_M_applyEcSt17integral_constantIbLb0EE
   0.58%  nmos-cpp-registry          [.] _ZNK9__gnu_cxx17__normal_iteratorIPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt6vectorIS6_SaIS6_EEE4baseEv
   0.54%  nmos-cpp-registry          [.] _ZNKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE5beginEv
   0.53%  libc-2.27.so               [.] __strlen_avx2
garethsb commented 5 years ago

The one thing that really strikes me is how much time it is spending in SHA-1 processing in boost::uuids.

That really does stand out. At the moment I'm nonplussed, since I don't expect the registry to even call those functions after it's got going - and indeed, in the first environment I've checked, it doesn't seem to.

Bill, please would you remind me which Boost version you're using?

billt-hlit commented 5 years ago

This build is on stock Ubuntu (rather than the cross-compile we use for nmos-cpp-node), so it's boost 1.65.

billt-hlit commented 5 years ago

Well, despite having things I'm supposed be working on, I investigated this further. It turns out that having the NMOS Explorer running is an important part of the puzzle. I set a breakpoint on the busiest function and traced the problem up to nmos::make_id in id.cpp. That function creates a random number generator every time it runs. Seeding the random number is very expensive. Better to create a long-lived boost::uuids::random_generator (in the caller, probably) and reuse it.

Relevant bits of stack:

#0  boost::uuids::detail::sha1::process_block (this=0x7ffff3e2e2e0, bytes_begin=0x7ffff3e2e350, bytes_end=0x7ffff3e2e364) at /usr/include/boost/uuid/sha1.hpp:121
#1  0x000055555584ce3a in boost::uuids::detail::sha1::process_bytes (this=0x7ffff3e2e2e0, buffer=0x7ffff3e2e350, byte_count=20) at /usr/include/boost/uuid/sha1.hpp:131
#2  0x000055555584d812 in boost::uuids::detail::seed_rng::sha1_random_digest_ (this=0x7ffff3e2e3f0) at /usr/include/boost/uuid/seed_rng.hpp:171
#3  0x000055555584d73a in boost::uuids::detail::seed_rng::operator() (this=0x7ffff3e2e3f0) at /usr/include/boost/uuid/seed_rng.hpp:139
#4  0x000055555584ef86 in boost::uuids::detail::generator_iterator<boost::uuids::detail::seed_rng>::generator_iterator (this=0x7ffff3e2e3d0, g=0x7ffff3e2e3f0) at /usr/include/boost/uuid/seed_rng.hpp:273
#5  0x000055555584e8eb in boost::uuids::detail::seed<boost::random::mersenne_twister_engine<unsigned int, 32ul, 624ul, 397ul, 31ul, 2567483615u, 11ul, 4294967295u, 7ul, 2636928640u, 15ul, 4022730752u, 18ul, 1812433253u> > (rng=...)
    at /usr/include/boost/uuid/seed_rng.hpp:302
#6  0x000055555584e227 in boost::uuids::basic_random_generator<boost::random::mersenne_twister_engine<unsigned int, 32ul, 624ul, 397ul, 31ul, 2567483615u, 11ul, 4294967295u, 7ul, 2636928640u, 15ul, 4022730752u, 18ul, 1812433253u> >::basic_random_generator (
    this=0x7ffff3e2e4a0) at /usr/include/boost/uuid/random_generator.hpp:46
#7  0x000055555584c887 in nmos::make_id[abi:cxx11]() () at /home/sable/bt/git/nmos-cpp/Development/nmos/id.cpp:14
#8  0x000055555585f02e in nmos::experimental::details::json_from_message (message=..., cursor=...) at /home/sable/bt/git/nmos-cpp/Development/nmos/logging_api.cpp:415
#9  0x0000555555854017 in nmos::experimental::insert_log_event (events=..., message=...) at /home/sable/bt/git/nmos-cpp/Development/nmos/logging_api.cpp:446
#10 0x00005555556e4ffb in (anonymous namespace)::main_gate::service (this=0x7fffffffe150, message=...) at /home/sable/bt/git/nmos-cpp/Development/nmos-cpp-registry/main_gate.h:74
#11 0x00005555556e4eac in (anonymous namespace)::main_gate::service_function::operator() (this=0x555555dcdac0, message=...) at /home/sable/bt/git/nmos-cpp/Development/nmos-cpp-registry/main_gate.h:59
#12 0x00005555556e864c in util::message_service<slog::async_log_message>::run<(anonymous namespace)::main_gate::service_function> (this=0x7fffffffe178, fn=...) at /home/sable/bt/git/nmos-cpp/Development/slog/all_in_one.h:3794
#13 0x00005555556e7f4a in slog::async_log_service<(anonymous namespace)::main_gate::service_function, slog::async_log_message>::<lambda()>::operator()(void) const (__closure=0x555555dcdab8) at /home/sable/bt/git/nmos-cpp/Development/slog/all_in_one.h:3875
garethsb commented 5 years ago

Ah, so I think this does depend on platform and Boost version because in all of my environments a different much more performant random generator is used. I'll think about how best to make a static generator.

That diagnosis is very much appreciated, Bill!

garethsb commented 5 years ago

Boost.Uuid changed the default random_generator at Boost 1.67.0. Before that it used to seed a Mersenne Twister PRNG using SHA-1. Expensive.

Therefore, since ids are generated for all log events, add nmos::id_generator to enable the PRNG to be cached. (The random_generator is not guaranteed thread-safe so a static instance isn't an option.)

There are no doubt other opportunities for performance improvement, but let's open new issues to cover those when identified.

billt-hlit commented 5 years ago

There are no doubt other opportunities for performance improvement, but let's open new issues to cover those when identified.

That sounds like an invitation. :imp:

gcc whines about id_generator::operator() in your fix: nmos/id.cpp:36:6: warning: extra ';' [-Wpedantic]

garethsb commented 5 years ago

We're open source, so it's definitely an invitation for something. 😈

billt-hlit commented 5 years ago

OK, that helped -- running immediate activations back-to-back on my node now results in the registry "only" consuming about half a CPU. The big winner now is std::__detail::_BracketMatcher<std::__cxx11::regex_traits, false, false>::_M_apply, which is called, very indirectly, by regexes used in the json_validator, as well as by the URL router. Want me to open a new ticket with the details? Given that the json validator is apparently recompiling its regexp at run time, it might be profitable to compile them once and reuse them.

garethsb commented 5 years ago

I figured regexes might well be next. We use them for route matching and in validation, and there could be some easy wins. Please do open a new issue, thanks, Bill.