utk-se / WorldSyntaxTree

Language-agnostic parsing of World of Code repositories
Other
20 stars 0 forks source link

Decomposition of tree matching for single-root match checking parallelization #15

Open robobenklein opened 3 years ago

robobenklein commented 3 years ago

Entirely just a thought exercise in parallel query matching, probably not reasonable to implement for both performance and "because someone else has already done it better"

Break down the match rule into a sequence of MATCHes and RULEs

Ex. Finding functions with some number of return statements:

  1. get list of function nodes
  2. append each child of the function node to a queue with RULE accept() if node.type == return else q.put(node.children) will descend
  3. if no children and not accept(), simply not re-adding to queue will be a reject
  4. all queries should be decomposable into a sequence of singular operations which either accept(), add partial matches back into the queue, or do nothing if they don't match

thus even with only one root node to start a search for a query parallelization can be used

the computational problem is to find the correct ordering of the RULEs that the query can be broken down into

recalling cypher-like (n1:WSTNode {type: function})<-[*:parent]-(n2:WSTNode {type: return})<-[*:parent]-(n3:WSTNode {type:int}) can be broken down into:

  1. initial filter to nodes which are type:function
  2. find all descendants from n1
  3. if n2 matches, append to a result chain object
t = q.get()
switch len(t):
  *rule:
    if rule matches and is not last: add (t + n) to q
    if rule matches and is last: accept(t + n)

example rules:

  1. n is a function type
  2. n is a return type and is descendant of t[0]
  3. n is a int type and is descendant of t[1] accept t, len(t) = 3