Closed jgraley closed 3 years ago
With fast NLQ:
The DQ()s again. Comparing carefully with the slow one above, we notice that DQs from DCs is the same (2679038 hits) but DQs from NLQs have dropped by about the same as SearchContainerAgent::DQ
has dropped, suggesting that almost all of the remaining are coming from DC as expected (fast only works in NLQ path: DC path is to be replaced by CSP solver). the discrepancy probably due to AnyNode
, which is a SearchContainerAgent
but has not fast version yet and so still does DQs in NLQ.
-----------------------------------------------
0.00 0.16 78213/9896781 SR::AgentCommonDomainExtender::ExpandNormalDomain(std::unordered_set<SR::XLink, std::hash<SR::XLink>, std::equal_to<SR::XLink>, std::allocator<SR::XLink> > const&) [108]
0.02 5.41 2679038/9896781 SR::AndRuleEngine::DecidedCompare(SR::LocatedLink) <cycle 4> [16]
0.06 14.42 7139530/9896781 SR::AgentCommon::ResumeNormalLinkedQuery(SR::Conjecture&, SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const [12]
[7] 75.6 0.08 19.99 9896781 SR::AgentCommon::RunDecidedQuery(SR::DecidedQueryAgentInterface&, SR::XLink) const [7]
0.33 18.03 7334870/7334870 SR::StandardAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [8]
0.01 0.65 72794/72794 SR::TransformOfAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [37]
0.05 0.33 1007082/1007082 SR::StarAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [67]
0.04 0.32 703343/703343 SR::SearchContainerAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [71]
0.04 0.00 1752/1752 SR::PatternLink::GetPatternPtr() const [223]
0.04 0.00 9896781/10445638 SR::DecidedQuery::CompleteDecisionsWithEmpty() [217]
0.01 0.02 465884/465884 SR::OverlayAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [239]
0.03 0.01 9890691/1398178442 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() <cycle 1> [20]
0.00 0.03 137473/137473 SR::MatchAllAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [263]
0.01 0.01 84281/84281 SR::MatchAnyAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [323]
0.00 0.02 76805/76805 SR::NotMatchAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [331]
0.01 0.00 9896781/33435457 SR::XLink::operator==(SR::XLink const&) const [206]
0.00 0.01 1096/1096 SR::PointerIsAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [438]
0.00 0.00 4713/4713 NestedAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [733]
0.00 0.00 1089/1089 SR::GreenGrassAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [941]
0.00 0.00 1138/1138 SR::SlaveAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [954]
0.00 0.00 6090/9896781 SR::DecidedQuery::Reset() [133]
0.00 0.00 1752/22172134 SR::DecidedQuery::RegisterNormalLink(SR::PatternLink, SR::XLink) [72]
0.00 0.00 1752/49866479 SR::PatternLink::PatternLink(SR::Agent const*, TreePtrInterface const*) [36]
0.00 0.00 123/123 IdentifierByNameAgent::RunDecidedQueryImpl(SR::DecidedQueryAgentInterface&, SR::XLink) const [1534]
-----------------------------------------------
In both fast and slow cases, the real work is done by lambdas returned from StartNLQ() and then invoked directly in regeneration. See here:
-----------------------------------------------
0.00 16.66 772657/772657 SR::AndRuleEngine::RegenerationPassAgent(SR::Agent*, SR::XLink, std::map<SR::Agent*, SR::XLink, std::less<SR::Agent*>, std::allocator<std::pair<SR::Agent* const, SR::XLink> > > const&) <cycle 5> [9]
[10] 62.8 0.00 16.66 772657 std::function<std::shared_ptr<SR::DecidedQuery> ()>::operator()() const [10]
0.01 16.65 762243/762243 std::_Function_handler<std::shared_ptr<SR::DecidedQuery> (), SR::AgentCommon::SlowStartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const::{lambda()#1}>::_M_invoke(std::_Any_data const&) [11]
0.00 0.00 10414/10414 std::_Function_handler<std::shared_ptr<SR::DecidedQuery> (), SR::StuffAgent::FastStartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const::{lambda()#1}>::_M_invoke(std::_Any_data const&) [4655]
-----------------------------------------------
The slow lambda is doing the work of StandardAgent as well as the other agents. The work of Stuff is split between StuffAgent::FastStartNormalLinkedQuery()
and its lambda. Lambda is taking an immeasurably small amount of CPU time: 0.00, and StuffAgent::FastStartNormalLinkedQuery()
is taking 0.01 (can be compared to about two thirds of the 0.80 figure in slow version).
Note on hit rates: overall 0.77Mhits on regen is about right. There are effectively 2 nested conjecture walks:
At 10Khits, the fast lambda is barely being called at all. That suggests that the above conjecture walk was generating 200 DQs per Stuff regen, so at least by hit-rate on a O(1) function, we're 200 times faster.
The fast algo's lambda is too small to register. StuffAgent::FastStartNormalLinkedQuery()
:
-----------------------------------------------
0.00 0.01 10329/10329 SR::AgentCommon::StartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const [99]
[337] 0.1 0.00 0.01 10329 SR::StuffAgent::FastStartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const [337]
0.01 0.00 29256/29256 std::_Hashtable<SR::XLink, std::pair<SR::XLink const, SR::XLink>, std::allocator<std::pair<SR::XLink const, SR::XLink> >, std::__detail::_Select1st, std::equal_to<SR::XLink>, std::hash<SR::XLink>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::count(SR::XLink const&) const [354]
0.00 0.00 10329/93138 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<SR::SubSequence*>(SR::SubSequence*) [391]
0.00 0.00 10329/548857 SR::AgentCommon::CreateDecidedQuery() const [278]
0.00 0.00 124021/1398178442 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() <cycle 1> [20]
0.00 0.00 29256/29256 std::__detail::_Map_base<SR::XLink, std::pair<SR::XLink const, SR::XLink>, std::allocator<std::pair<SR::XLink const, SR::XLink> >, std::__detail::_Select1st, std::equal_to<SR::XLink>, std::hash<SR::XLink>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::at(SR::XLink const&) const [757]
0.00 0.00 10329/93138 SR::SubSequence::SubSequence() [571]
0.00 0.00 10329/27914053 Tracer::Descend::~Descend() [56]
0.00 0.00 10329/22172134 SR::DecidedQuery::RegisterNormalLink(SR::PatternLink, SR::XLink) [72]
0.00 0.00 10780/49866479 SR::PatternLink::PatternLink(SR::Agent const*, TreePtrInterface const*) [36]
0.00 0.00 10329/12663 SR::PatternLink::GetPattern() const [881]
0.00 0.00 10329/98101496 SR::XLink::GetChildX() const [26]
0.00 0.00 39585/33435457 SR::XLink::operator==(SR::XLink const&) const [206]
0.00 0.00 451/115575 SR::DecidedQuery::RegisterMultiplicityLink(SR::PatternLink, SR::XLink) [352]
0.00 0.00 10329/6244610 SR::LocatedLink::operator SR::PatternLink() const [300]
0.00 0.00 10329/55829697 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) [117]
0.00 0.00 10329/16497719 Matcher::CheckLocalMatch(Matcher const*) const [211]
0.00 0.00 21109/19582227 std::__shared_ptr<TreePtrInterface const, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<TreePtrInterface const, (__gnu_cxx::_Lock_policy)2> const&) [296]
0.00 0.00 10329/27507938 std::__weak_ptr<Node, (__gnu_cxx::_Lock_policy)2>::_M_assign(Node*, std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) [208]
0.00 0.00 10329/67300993 SR::XLink::XLink(SR::LocatedLink const&) [127]
0.00 0.00 10329/27914053 Tracer::Descend::Descend(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) [218]
0.00 0.00 451/27199499 SR::XLink::CreateDistinct(TreePtr<Node> const&) [68]
0.00 0.00 451/82693 TreePtr<SR::SubSequence>::operator TreePtr<Node>() const [798]
0.00 0.00 10329/10329 std::function<std::shared_ptr<SR::DecidedQuery> ()>::function<SR::StuffAgent::FastStartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const::{lambda()#1}, void, void>(SR::StuffAgent::FastStartNormalLinkedQuery(SR::XLink, std::__cxx11::list<SR::LocatedLink, std::allocator<SR::LocatedLink> > const&, SR::TheKnowledge const*) const::{lambda()#1}) [4656]
-----------------------------------------------
We can see evidence of a loop doing a small number of iterations (20Khits, 30Khits in places). This is actually looping from terminus to base (or root) and in general is an O(logN) loop.
It's worth noting the tiny proportion of total hits on the various operations within this function, for example XLink::operator==
(1 in about 800).
It's possible/likely that Hashtable at() and count() are taking up the most CPU with their hash conditioning algorithms. It may be worth trying to merge those calls.
Without fast NLQ (with is next comment)
Have to come in at DQ level - DQs include conejcture solve as well as regen pass. Anyway:
StandardAgent::DQ()
is way higher thanSearchContainerAgent::DQ()
(which implements stuff node). About 4 times as many hits, but takes much longer to run. Both hit rates are expected to be high due to the repeated conjecture searches undertaken during regen (on top of main solver) and the ratio of 4:1 is fine (Stuff is linear but big on the whole, as big subtrees are walked, but StandardAgent can raise its container size to the power of the number of decisions it registers) so this seems OK.This is why any performance benefit was being masked (I haven't made a fast standard agent yet).
Looking at
StandardAgent
, the spit is even-ish between sequence and collectionLooking at sequence (collection is much the same):
Many internal operations have far more than the 6.7 Mhits that this DQ does., eg 127Mhits for
ContainerInterface::iterator::operator!=
. This clarifies thatStandardAgent::DQ()
will don
iterations (walking the collection once) each time it runs. Hence there's one more degree of complexity order inside StandardAgent::DQ, roughly speaking, and I hadn't been keeping that in mind.The Stuff node's DQ() breakdown:
No elevated hit counts, everything hangs around 2Mhits.