Closed ehildenb closed 1 year ago
Non-
Cover
edges are not allowed to go backwards.
I strongly agree with this. In my opinion, the data structure we use to store symbolic execution state should be a tree w. r. t. the step relation (circularities can be encoded using covers). This restriction makes processing the data structure much easier, while the circular representation can still be derived.
This likely isn't possible as a restriction, because a transition system which had
a => b => a
must have a back-loop via anEdge
This can be implemented if the identity of a node is not equal to the term it stores. So either the hash of a node has to include the hash of the parent node, or the identity should not be the hash (it can be a unique integer for instance). Using this second solution, the hashes can still be used to e.g. search for nodes that store syntactically equal terms.
I have pulled the second (simpler) change out here: https://github.com/runtimeverification/pyk/pull/428/files
I think it is still worth it to investigate the first change though. They are checking things that overlap, but coer different cases. With just the second check, you still can have a loop a => b => a
with 1-step rewrites, and with just the second check you still can have a zero-length loop a -split-> b -cover-> a
.
Perhaps we can somehow name nodes given their tree traversal order? So if we have:
a -> b
b -> c
b -> d
c -> a
d -> e
e -> a
Then we make nodes:
0: hash(a)
0.0: hash(b)
0.0.0: hash(c)
0.0.1: hash(d)
0.0.0.0: hash(a)
0.0.1.0: hash(e)
0.0.1.0.0: hash(a)
So that the node identifier is telling us where in teh tree it is (every node must be in the tree).
Downside of this approach, is that when you do something like remove-node
, then the node identifier of other nodes could cahnge, because an entire subtree could be removed. Upside is that just by looking at the node-identifier, you would know roughly where/how-deep in the tree you are.
Or we can just go with completely unique UUIDs, but also be able to compute this tree traversal order?
Note that we can still store the nodes themselves as dict[str, Node]
, where the str
is a hash of the node (so we're not storing nodes multiple times), but then just store the separate table of the dict[str, str]
, where the key is the node UUID and the value is the node hash.
I think we should do the following:
int
s as node IDs.NodeIdLike = int | str
def _resolve(self, id: NodeIdLike) -> Node:
...
where id
is either an ID, or an alias, or a path expression like "0.0.1.0"
. Besides this straightforward syntax, we can introduce all sorts of syntax sugar, e.g. for concisely addressing nodes that are deep in a straight line execution path.
Sounds great!
A bug in how
extract_branches
heuristic led to some branches being automatically trivially proved because thhey had trivial 0-step loop-backs withSplit
edges https://github.com/runtimeverification/pyk/pull/377. We should add restrictions to the generated KCFGs so that this cannot happen (in additioon to the current restriction that eachNode
can have at most oneSuccessor
).Cover
edges are not allowed to go backwards. This likely isn't possible as a restriction, because a transition system which hada => b => a
must have a back-loop via anEdge
. So perhaps another restriction is:a --> a
with aCover
or aSplit
, there must be symbolic execution involved. Another example where restricting zero-length loops would have stopped a bug earlier with a better message: https://github.com/runtimeverification/pyk/issues/398