I believe I found a bug in the implementation of e-matching in basic.ml. I have found instances where e-matching does not match all valid expressions in the e-graph, and thus rewrites are not correctly performed. I created the failing tests on my fork here.
Test 1: I created an egraph with the expressions (g 1) and (g 2), and then merged these two. Then, I rebuilt the egraph and called EGraph.ematch on the query (g ?a), expecting two matches, but only getting one. Note that there are two matches, as expected, if we do not merge (g 1) and (g 2).
Test 2: To ensure that this was not caused by inadvertently breaking any internal invariants of the egraph, I created another test that only uses the public interface of the egraph. Here, I again created an egraph with (g 1) and (g 2) and merged them, and then I applied the rewrite (g ?a) => (h ?a). This should create the expressions (h 1) and (h 2), which are both equivalent to (g 1) and (g 2). However, due to the bug previously shown, I think (h 2) is actually not created. I then designed a cost function such that (h 2) would be the cheapest term to extract, but (h 1) was returned instead. I wrote an equivalent test in egg and confirmed that extracting (h 2) should be the intended behavior.
The fix seems to be straightforward. The issue is that for a given enode/eclass, there can be
multiple valid substitutions for the same pattern (for instance, for the query (g ?a) in the first test case, both a=1 and a=2 are valid substitutions). Thus, the helper functions enode_matches and match_in should return a list/iter of substitutions rather than an option.
This patch passes all existing tests, as well as a few inline tests I added that previously failed due to ematch not finding all the substitutions. Note that the e-matching algorithm in generic.ml seems largely the same, and probably has to be fixed as well.
Oh! Thanks so much for the detailed bug report and looking into it and the excellent patch!! This looks good to me, I'll merge it in!!! Thanks again!!!
I believe I found a bug in the implementation of e-matching in
basic.ml
. I have found instances where e-matching does not match all valid expressions in the e-graph, and thus rewrites are not correctly performed. I created the failing tests on my fork here.Test 1: I created an egraph with the expressions
(g 1)
and(g 2)
, and then merged these two. Then, I rebuilt the egraph and calledEGraph.ematch
on the query(g ?a)
, expecting two matches, but only getting one. Note that there are two matches, as expected, if we do not merge(g 1)
and(g 2)
.Test 2: To ensure that this was not caused by inadvertently breaking any internal invariants of the egraph, I created another test that only uses the public interface of the egraph. Here, I again created an egraph with
(g 1)
and(g 2)
and merged them, and then I applied the rewrite(g ?a) => (h ?a)
. This should create the expressions(h 1)
and(h 2)
, which are both equivalent to(g 1)
and(g 2)
. However, due to the bug previously shown, I think(h 2)
is actually not created. I then designed a cost function such that(h 2)
would be the cheapest term to extract, but(h 1)
was returned instead. I wrote an equivalent test in egg and confirmed that extracting(h 2)
should be the intended behavior.The fix seems to be straightforward. The issue is that for a given enode/eclass, there can be multiple valid substitutions for the same pattern (for instance, for the query
(g ?a)
in the first test case, botha=1
anda=2
are valid substitutions). Thus, the helper functionsenode_matches
andmatch_in
should return a list/iter of substitutions rather than an option.This patch passes all existing tests, as well as a few inline tests I added that previously failed due to
ematch
not finding all the substitutions. Note that the e-matching algorithm ingeneric.ml
seems largely the same, and probably has to be fixed as well.