runtimeverification / pyk

Python tools for the K Framework
BSD 3-Clause "New" or "Revised" License
13 stars 2 forks source link

Restrict back-edges in KCFGs #378

Closed ehildenb closed 1 year ago

ehildenb commented 1 year ago

A bug in how extract_branches heuristic led to some branches being automatically trivially proved because thhey had trivial 0-step loop-backs with Split edges https://github.com/runtimeverification/pyk/pull/377. We should add restrictions to the generated KCFGs so that this cannot happen (in additioon to the current restriction that each Node can have at most one Successor).

tothtamas28 commented 1 year ago

Non-Cover edges are not allowed to go backwards.

I strongly agree with this. In my opinion, the data structure we use to store symbolic execution state should be a tree w. r. t. the step relation (circularities can be encoded using covers). This restriction makes processing the data structure much easier, while the circular representation can still be derived.

This likely isn't possible as a restriction, because a transition system which had a => b => a must have a back-loop via an Edge

This can be implemented if the identity of a node is not equal to the term it stores. So either the hash of a node has to include the hash of the parent node, or the identity should not be the hash (it can be a unique integer for instance). Using this second solution, the hashes can still be used to e.g. search for nodes that store syntactically equal terms.

ehildenb commented 1 year ago

I have pulled the second (simpler) change out here: https://github.com/runtimeverification/pyk/pull/428/files

I think it is still worth it to investigate the first change though. They are checking things that overlap, but coer different cases. With just the second check, you still can have a loop a => b => a with 1-step rewrites, and with just the second check you still can have a zero-length loop a -split-> b -cover-> a.

Perhaps we can somehow name nodes given their tree traversal order? So if we have:

a -> b
b -> c
b -> d
c -> a
d -> e
e -> a

Then we make nodes:

0: hash(a)
0.0: hash(b)
0.0.0: hash(c)
0.0.1: hash(d)
0.0.0.0: hash(a)
0.0.1.0: hash(e)
0.0.1.0.0: hash(a)

So that the node identifier is telling us where in teh tree it is (every node must be in the tree).

Downside of this approach, is that when you do something like remove-node, then the node identifier of other nodes could cahnge, because an entire subtree could be removed. Upside is that just by looking at the node-identifier, you would know roughly where/how-deep in the tree you are.

Or we can just go with completely unique UUIDs, but also be able to compute this tree traversal order?

Note that we can still store the nodes themselves as dict[str, Node], where the str is a hash of the node (so we're not storing nodes multiple times), but then just store the separate table of the dict[str, str], where the key is the node UUID and the value is the node hash.

tothtamas28 commented 1 year ago

I think we should do the following:

NodeIdLike = int | str

def _resolve(self, id: NodeIdLike) -> Node:
    ...

where id is either an ID, or an alias, or a path expression like "0.0.1.0". Besides this straightforward syntax, we can introduce all sorts of syntax sugar, e.g. for concisely addressing nodes that are deep in a straight line execution path.

ehildenb commented 1 year ago

Sounds great!