jgraley / inferno-cpp2v

2 stars 0 forks source link

Profiling fast stuff NLQ #222

Closed jgraley closed 3 years ago

jgraley commented 3 years ago

Without fast NLQ (with is next comment)

Have to come in at DQ level - DQs include conejcture solve as well as regen pass. Anyway:

-----------------------------------------------
                0.00    0.17   78213/11249172     SR::AgentCommonDomainExtender::ExpandNormalDomain(std::unordered_set<SR::XLink, std::hash<SR::XLink>, std::equal_to<SR::XLink>, std::allocator<SR::XLink> > const&) [114]
                0.01    5.66 2679038/11249172     SR::AndRuleEngine::DecidedCompare(SR::LocatedLink) <cycle 4> [17]
                0.04   17.93 8491921/11249172     SR::AgentCommon::ResumeNormalLinkedQuery(SR::Conjecture&, SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const [12]
[7]     72.2    0.05   23.76 11249172         SR::AgentCommon::RunDecidedQuery(SR::DecidedQueryAgentInterface&, SR::XLink) const [7]
                0.51   21.09 7334870/7334870     SR::StandardAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [9]
                0.04    0.80 2055734/2055734     SR::SearchContainerAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [37]
                0.01    0.53   72794/72794       SR::TransformOfAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [49]
                0.07    0.43 1007082/1007082     SR::StarAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [53]
                0.08    0.00 11249172/11798029     SR::DecidedQuery::CompleteDecisionsWithEmpty() [179]
                0.04    0.01 11243082/1437095147     std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() <cycle 1> [20]
                0.01    0.03  137473/137473      SR::MatchAllAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [258]
                0.02    0.01   84281/84281       SR::MatchAnyAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [266]
                0.00    0.03  465884/465884      SR::OverlayAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [277]
                0.03    0.00    1752/1752        SR::PatternLink::GetPatternPtr() const [286]
                0.01    0.00 11249172/35977503     SR::XLink::operator==(SR::XLink const&) const [242]
                0.00    0.01    1096/1096        SR::PointerIsAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [492]
                0.00    0.01   76805/76805       SR::NotMatchAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [507]
                0.00    0.00    4713/4713        NestedAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [843]
                0.00    0.00    6090/11249172     SR::DecidedQuery::Reset() [104]
                0.00    0.00    1089/1089        SR::GreenGrassAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [1133]
                0.00    0.00    1138/1138        SR::SlaveAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [1134]
                0.00    0.00    1752/51323069     SR::PatternLink::PatternLink(SR::Agent const*, TreePtrInterface const*) [39]
                0.00    0.00    1752/23514196     SR::DecidedQuery::RegisterNormalLink(SR::PatternLink, SR::XLink) [81]
                0.00    0.00     123/123         IdentifierByNameAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [1564]
-----------------------------------------------

StandardAgent::DQ() is way higher than SearchContainerAgent::DQ() (which implements stuff node). About 4 times as many hits, but takes much longer to run. Both hit rates are expected to be high due to the repeated conjecture searches undertaken during regen (on top of main solver) and the ratio of 4:1 is fine (Stuff is linear but big on the whole, as big subtrees are walked, but StandardAgent can raise its container size to the power of the number of decisions it registers) so this seems OK.

This is why any performance benefit was being masked (I haven't made a fast standard agent yet).

Looking at StandardAgent, the spit is even-ish between sequence and collection

-----------------------------------------------
                0.51   21.09 7334870/7334870     SR::AgentCommon::RunDecidedQuery(SR::DecidedQueryAgentInterface&, SR::XLink) const [7]
[9]     65.5    0.51   21.09 7334870         SR::StandardAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [9]
                1.59   11.97 6767843/6767843     SR::StandardAgent::DecidedQuerySequence(SR::XLink, SR::DecidedQueryAgentInterface&, SequenceInterface*, SequenceInterface&) const [13]
                0.73    5.72 6832086/6832086     SR::StandardAgent::DecidedQueryCollection(SR::XLink, SR::DecidedQueryAgentInterface&, CollectionInterface*, CollectionInterface&) const [15]
...

Looking at sequence (collection is much the same):

-----------------------------------------------
                1.59   11.97 6767843/6767843     SR::StandardAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [9]
[13]    41.1    1.59   11.97 6767843         SR::StandardAgent::DecidedQuerySequence(SR::XLink, SR::DecidedQueryAgentInterface&, SequenceInterface*, SequenceInterface&) const [13]
                2.22    0.48 655435840/1437095147     std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() <cycle 1> [20]
                0.34    0.76 20297877/20297877     SR::SubSequenceRange::SubSequenceRange(TreePtr<Node>, ContainerInterface::iterator&, ContainerInterface::iterator&) [28]
                0.25    0.61 128321198/142309975     ContainerInterface::iterator::operator!=(ContainerInterface::iterator const&) const [33]
                0.44    0.36 195449110/266599020     ContainerInterface::iterator::operator*() const [30]
                0.31    0.49 81013910/135075514     TreePtr<CPPTree::Statement>::operator TreePtr<Node>() const [24]
                0.16    0.58 13531029/15679721     SR::DecidedQuery::RegisterDecision(ContainerInterface::iterator, ContainerInterface::iterator, bool, std::shared_ptr<ContainerInterface>) [36]
                0.66    0.00 229567050/306026339     Sequential<CPPTree::Statement>::iterator::Clone() const [35]
                0.36    0.29 121468346/192338147     ContainerInterface::iterator::operator++() [32]
                0.13    0.50 40515739/51323069     SR::PatternLink::PatternLink(SR::Agent const*, TreePtrInterface const*) [39]
                0.09    0.43 40515739/100847650     SR::XLink::GetChildX() const [25]
                0.22    0.23 94562507/306954218     ContainerInterface::iterator::operator==(ContainerInterface::iterator const&) const [23]
                0.18    0.18 20297877/27303247     SR::XLink::CreateDistinct(TreePtr<Node> const&) [55]
                0.11    0.16 74429714/75249771     std::__shared_count<(__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) [86]
                0.13    0.14 20297877/27184974     SR::DecidedQuery::RegisterAbnormalLink(SR::PatternLink, SR::XLink) [72]
                0.12    0.14 20217862/23514196     SR::DecidedQuery::RegisterNormalLink(SR::PatternLink, SR::XLink) [81]
                0.25    0.00 209284829/209605179     virtual thunk to Sequential<CPPTree::Statement>::end() [93]
                0.12    0.00 33829683/33829683     std::__shared_ptr<ContainerInterface::iterator_interface, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<ContainerInterface::iterator_interface, (__gnu_cxx::_Lock_policy)2> const&) [147]
                0.09    0.02 20217862/50241112     SR::XLink::XLink(std::shared_ptr<Node const>, TreePtrInterface const*, void*) [89]
                0.08    0.00 20217862/20217862     ContainerInterface::iterator::EnsureUnique() [181]
                0.06    0.02 6767843/29266444     Tracer::Descend::~Descend() [76]
                0.05    0.00 53672937/129707250     TreePtr<CPPTree::Statement>::operator bool() const [144]
                0.05    0.00 40516883/40516883     std::shared_ptr<SR::StarAgent> std::dynamic_pointer_cast<SR::StarAgent, Node>(std::shared_ptr<Node> const&) [235]
                0.04    0.00 20215548/20215548     ContainerCommon<std::__cxx11::list<TreePtr<CPPTree::Statement>, std::allocator<TreePtr<CPPTree::Statement> > > >::iterator::operator--() [245]
                0.04    0.00 13521408/13633616     ContainerCommon<std::__cxx11::list<TreePtr<CPPTree::Statement>, std::allocator<TreePtr<CPPTree::Statement> > > >::size() const [246]
                0.03    0.00 20282221/20478131     virtual thunk to Sequential<CPPTree::Statement>::begin() [281]
                0.03    0.00 20297877/27601808     std::__weak_ptr<Node, (__gnu_cxx::_Lock_policy)2>::_M_assign(Node*, std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) [257]
                0.02    0.00 6767843/58534479     void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) [126]
                0.00    0.01 6767843/29266444     Tracer::Descend::Descend(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) [222]
                0.00    0.00   18712/8873380     TreePtr<CPPTree::Expression>::operator TreePtr<Node>() const [156]
                0.00    0.00   21090/1628194     virtual thunk to Sequential<CPPTree::Expression>::begin() [472]
                0.00    0.00   14060/941742      ContainerCommon<std::__cxx11::list<TreePtr<CPPTree::Expression>, std::allocator<TreePtr<CPPTree::Expression> > > >::size() const [546]
                0.00    0.00   89714/4050342     Sequential<CPPTree::Expression>::iterator::Clone() const [4430]
                0.00    0.00   68624/1813317     virtual thunk to Sequential<CPPTree::Expression>::end() [4443]
                0.00    0.00    9950/1519973     TreePtr<CPPTree::Expression>::operator bool() const [4450]
                0.00    0.00    2314/2314        ContainerCommon<std::__cxx11::list<TreePtr<CPPTree::Expression>, std::allocator<TreePtr<CPPTree::Expression> > > >::iterator::operator--() [4882]
-----------------------------------------------

Many internal operations have far more than the 6.7 Mhits that this DQ does., eg 127Mhits for ContainerInterface::iterator::operator!=. This clarifies that StandardAgent::DQ() will do n iterations (walking the collection once) each time it runs. Hence there's one more degree of complexity order inside StandardAgent::DQ, roughly speaking, and I hadn't been keeping that in mind.

The Stuff node's DQ() breakdown:

-----------------------------------------------
                0.04    0.80 2055734/2055734     SR::AgentCommon::RunDecidedQuery(SR::DecidedQueryAgentInterface&, SR::XLink) const [7]
[37]     2.5    0.04    0.80 2055734         SR::SearchContainerAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [37]
                0.01    0.16 1963350/1963350     SR::StuffAgent::GetContainerInterface(SR::XLink) const [119]
                0.02    0.09 2027938/15679721     SR::DecidedQuery::RegisterDecision(ContainerInterface::iterator, ContainerInterface::iterator, bool, std::shared_ptr<ContainerInterface>) [36]
                0.00    0.10 1963350/1963350     SR::StuffAgent::GetXLinkFromIterator(SR::XLink, ContainerInterface::iterator) const [162]
                0.08    0.02 24418644/1437095147     std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() <cycle 1> [20]
                0.02    0.04 3926700/5585745     Walk_iterator::Clone() const [173]
                0.01    0.04 1963350/1963350     SR::StuffAgent::DecidedQueryRestrictions(SR::DecidedQueryAgentInterface&, ContainerInterface::iterator, SR::XLink) const [234]
                0.02    0.03 2055734/11249172     SR::DecidedQuery::Reset() [104]
                0.01    0.02 2027938/51323069     SR::PatternLink::PatternLink(SR::Agent const*, TreePtrInterface const*) [39]
                0.01    0.02 2055734/2055734     ContainerInterface::empty() [297]
                0.00    0.02 2055734/100847650     SR::XLink::GetChildX() const [25]
                0.01    0.01 2027938/23514196     SR::DecidedQuery::RegisterNormalLink(SR::PatternLink, SR::XLink) [81]
                0.02    0.01 2055734/29266444     Tracer::Descend::~Descend() [76]
                0.00    0.01 2055734/17850110     Matcher::CheckLocalMatch(Matcher const*) const [208]
                0.00    0.01   92384/92384       SR::AnyNodeAgent::GetContainerInterface(SR::XLink) const [520]
                0.01    0.00 2055734/58534479     void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) [126]
                0.00    0.00 1963350/3932363     ContainerFromIterator<Walk_iterator, TreePtr<Node>, Filter*, Filter*>::begin() [430]
                0.00    0.00 2055734/29266444     Tracer::Descend::Descend(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) [222]
                0.00    0.00   64588/64588       SR::AnyNodeAgent::GetXLinkFromIterator(SR::XLink, ContainerInterface::iterator) const [612]
                0.00    0.00   64588/2857668     ContainerFromIterator<FlattenNode_iterator, TreePtr<Node>, int, int>::begin() [225]
                0.00    0.00  129176/23553044     FlattenNode_iterator::Clone() const [164]
                0.00    0.00   64588/19173948     ContainerFromIterator<FlattenNode_iterator, TreePtr<Node>, int, int>::end() [471]
                0.00    0.00 1963350/3932363     ContainerFromIterator<Walk_iterator, TreePtr<Node>, Filter*, Filter*>::end() [4432]
                0.00    0.00   64588/64588       SR::SearchContainerAgent::DecidedQueryRestrictions(SR::DecidedQueryAgentInterface&, ContainerInterface::iterator, SR::XLink) const [4574]
-----------------------------------------------

No elevated hit counts, everything hangs around 2Mhits.

jgraley commented 3 years ago

With fast NLQ:

The DQ()s again. Comparing carefully with the slow one above, we notice that DQs from DCs is the same (2679038 hits) but DQs from NLQs have dropped by about the same as SearchContainerAgent::DQ has dropped, suggesting that almost all of the remaining are coming from DC as expected (fast only works in NLQ path: DC path is to be replaced by CSP solver). the discrepancy probably due to AnyNode, which is a SearchContainerAgent but has not fast version yet and so still does DQs in NLQ.

-----------------------------------------------
                0.00    0.16   78213/9896781     SR::AgentCommonDomainExtender::ExpandNormalDomain(std::unordered_set<SR::XLink, std::hash<SR::XLink>, std::equal_to<SR::XLink>, std::allocator<SR::XLink> > const&) [108]
                0.02    5.41 2679038/9896781     SR::AndRuleEngine::DecidedCompare(SR::LocatedLink) <cycle 4> [16]
                0.06   14.42 7139530/9896781     SR::AgentCommon::ResumeNormalLinkedQuery(SR::Conjecture&, SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const [12]
[7]     75.6    0.08   19.99 9896781         SR::AgentCommon::RunDecidedQuery(SR::DecidedQueryAgentInterface&, SR::XLink) const [7]
                0.33   18.03 7334870/7334870     SR::StandardAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [8]
                0.01    0.65   72794/72794       SR::TransformOfAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [37]
                0.05    0.33 1007082/1007082     SR::StarAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [67]
                0.04    0.32  703343/703343      SR::SearchContainerAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [71]
                0.04    0.00    1752/1752        SR::PatternLink::GetPatternPtr() const [223]
                0.04    0.00 9896781/10445638     SR::DecidedQuery::CompleteDecisionsWithEmpty() [217]
                0.01    0.02  465884/465884      SR::OverlayAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [239]
                0.03    0.01 9890691/1398178442     std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() <cycle 1> [20]
                0.00    0.03  137473/137473      SR::MatchAllAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [263]
                0.01    0.01   84281/84281       SR::MatchAnyAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [323]
                0.00    0.02   76805/76805       SR::NotMatchAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [331]
                0.01    0.00 9896781/33435457     SR::XLink::operator==(SR::XLink const&) const [206]
                0.00    0.01    1096/1096        SR::PointerIsAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [438]
                0.00    0.00    4713/4713        NestedAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [733]
                0.00    0.00    1089/1089        SR::GreenGrassAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [941]
                0.00    0.00    1138/1138        SR::SlaveAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [954]
                0.00    0.00    6090/9896781     SR::DecidedQuery::Reset() [133]
                0.00    0.00    1752/22172134     SR::DecidedQuery::RegisterNormalLink(SR::PatternLink, SR::XLink) [72]
                0.00    0.00    1752/49866479     SR::PatternLink::PatternLink(SR::Agent const*, TreePtrInterface const*) [36]
                0.00    0.00     123/123         IdentifierByNameAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [1534]
-----------------------------------------------

In both fast and slow cases, the real work is done by lambdas returned from StartNLQ() and then invoked directly in regeneration. See here:

-----------------------------------------------
                0.00   16.66  772657/772657      SR::AndRuleEngine::RegenerationPassAgent(SR::Agent*, SR::XLink, std::map<SR::Agent*, SR::XLink, std::less<SR::Agent*>, std::allocator<std::pair<SR::Agent* const, SR::XLink> > > const&) <cycle 5> [9]
[10]    62.8    0.00   16.66  772657         std::function<std::shared_ptr<SR::DecidedQuery> ()>::operator()() const [10]
                0.01   16.65  762243/762243      std::_Function_handler<std::shared_ptr<SR::DecidedQuery> (), SR::AgentCommon::SlowStartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const::{lambda()#1}>::_M_invoke(std::_Any_data const&) [11]
                0.00    0.00   10414/10414       std::_Function_handler<std::shared_ptr<SR::DecidedQuery> (), SR::StuffAgent::FastStartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const::{lambda()#1}>::_M_invoke(std::_Any_data const&) [4655]
-----------------------------------------------

The slow lambda is doing the work of StandardAgent as well as the other agents. The work of Stuff is split between StuffAgent::FastStartNormalLinkedQuery() and its lambda. Lambda is taking an immeasurably small amount of CPU time: 0.00, and StuffAgent::FastStartNormalLinkedQuery()is taking 0.01 (can be compared to about two thirds of the 0.80 figure in slow version).

Note on hit rates: overall 0.77Mhits on regen is about right. There are effectively 2 nested conjecture walks:

  1. retry on bad abnormal links (outer, coded in regen, rarely more than 1 iteration)
  2. looking for good normal links (inner, coded in the lambda, many iterations for Stuff and StandardAgent) The higher numbers for DC/DQs come from 2.

At 10Khits, the fast lambda is barely being called at all. That suggests that the above conjecture walk was generating 200 DQs per Stuff regen, so at least by hit-rate on a O(1) function, we're 200 times faster.

The fast algo's lambda is too small to register. StuffAgent::FastStartNormalLinkedQuery():

-----------------------------------------------
                0.00    0.01   10329/10329       SR::AgentCommon::StartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const [99]
[337]    0.1    0.00    0.01   10329         SR::StuffAgent::FastStartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const [337]
                0.01    0.00   29256/29256       std::_Hashtable<SR::XLink, std::pair<SR::XLink const, SR::XLink>, std::allocator<std::pair<SR::XLink const, SR::XLink> >, std::__detail::_Select1st, std::equal_to<SR::XLink>, std::hash<SR::XLink>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::count(SR::XLink const&) const [354]
                0.00    0.00   10329/93138       std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<SR::SubSequence*>(SR::SubSequence*) [391]
                0.00    0.00   10329/548857      SR::AgentCommon::CreateDecidedQuery() const [278]
                0.00    0.00  124021/1398178442     std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() <cycle 1> [20]
                0.00    0.00   29256/29256       std::__detail::_Map_base<SR::XLink, std::pair<SR::XLink const, SR::XLink>, std::allocator<std::pair<SR::XLink const, SR::XLink> >, std::__detail::_Select1st, std::equal_to<SR::XLink>, std::hash<SR::XLink>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::at(SR::XLink const&) const [757]
                0.00    0.00   10329/93138       SR::SubSequence::SubSequence() [571]
                0.00    0.00   10329/27914053     Tracer::Descend::~Descend() [56]
                0.00    0.00   10329/22172134     SR::DecidedQuery::RegisterNormalLink(SR::PatternLink, SR::XLink) [72]
                0.00    0.00   10780/49866479     SR::PatternLink::PatternLink(SR::Agent const*, TreePtrInterface const*) [36]
                0.00    0.00   10329/12663       SR::PatternLink::GetPattern() const [881]
                0.00    0.00   10329/98101496     SR::XLink::GetChildX() const [26]
                0.00    0.00   39585/33435457     SR::XLink::operator==(SR::XLink const&) const [206]
                0.00    0.00     451/115575      SR::DecidedQuery::RegisterMultiplicityLink(SR::PatternLink, SR::XLink) [352]
                0.00    0.00   10329/6244610     SR::LocatedLink::operator SR::PatternLink() const [300]
                0.00    0.00   10329/55829697     void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) [117]
                0.00    0.00   10329/16497719     Matcher::CheckLocalMatch(Matcher const*) const [211]
                0.00    0.00   21109/19582227     std::__shared_ptr<TreePtrInterface const, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<TreePtrInterface const, (__gnu_cxx::_Lock_policy)2> const&) [296]
                0.00    0.00   10329/27507938     std::__weak_ptr<Node, (__gnu_cxx::_Lock_policy)2>::_M_assign(Node*, std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) [208]
                0.00    0.00   10329/67300993     SR::XLink::XLink(SR::LocatedLink const&) [127]
                0.00    0.00   10329/27914053     Tracer::Descend::Descend(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) [218]
                0.00    0.00     451/27199499     SR::XLink::CreateDistinct(TreePtr<Node> const&) [68]
                0.00    0.00     451/82693       TreePtr<SR::SubSequence>::operator TreePtr<Node>() const [798]
                0.00    0.00   10329/10329       std::function<std::shared_ptr<SR::DecidedQuery> ()>::function<SR::StuffAgent::FastStartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const::{lambda()#1}, void, void>(SR::StuffAgent::FastStartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const::{lambda()#1}) [4656]
-----------------------------------------------

We can see evidence of a loop doing a small number of iterations (20Khits, 30Khits in places). This is actually looping from terminus to base (or root) and in general is an O(logN) loop.

It's worth noting the tiny proportion of total hits on the various operations within this function, for example XLink::operator== (1 in about 800).

It's possible/likely that Hashtable at() and count() are taking up the most CPU with their hash conditioning algorithms. It may be worth trying to merge those calls.

jgraley commented 3 years ago

Conclusion: