Open tmoux opened 9 months ago
Oh, thanks for the bug report and the detailed debugging log!! Yepp, this does seem like a bug.
w.r.t the class_members field, you're right about the natural encoding being storing the parent enodes for each eclass. It's been a while since I looked at the code, I don't quite recall the encoding ego uses, but I remember that some of the deviations from the paper were motivated by looking at egg's source code.
Hm... this needs some more experimentation.
I tried to determine whether the generic version had the same bug. This analogous unit test in test_prop.ml
passes:
(fun () -> let graph = EGraph.init () in
let c1 = EGraph.add_node graph (L.of_sexp [%s x]) in
let c2 = EGraph.add_node graph (L.of_sexp [%s y]) in
let _ = EGraph.add_node graph (L.of_sexp [%s not x]) in
let _ = EGraph.add_node graph (L.of_sexp [%s not y]) in
EGraph.merge graph c1 c2;
EGraph.rebuild graph;
let q = qf [%s not "?a"] in
let matches = EGraph.find_matches (EGraph.freeze graph) q |> Iter.length in
Alcotest.(check int) "1 match" 1 matches)
which seems to indicate that the bug may be limited to the basic version. From my cursory look, the implementation of rebuilding in the generic version is slightly different, and that might explain why the two implementations have different behavior in this case.
It seems generic.ml
stores parent enodes in the eclass similar to egg:
https://github.com/verse-lab/ego/blob/1d9a39a9064fddccaaee5282cc8a695bbed80955/lib/generic.ml#L24-L29
oooh, yepp, that could be it!! Good find. I think I wrote the initial basic version from scratch, and then iterated on the design to improve the performance. For the generic one, I ported over the design from egg in rust, but didn't backport those changes to the basic one. Hence the bug.
Gotcha, the generic version is a more well tested than the basic one, so I can see how the bug might've slipped through. I think I'll try to implement a fix when I have time.
I believe I have found a bug related to egraph merging/rebuilding. It occurs in this test in
basic.ml
:Here, we merge
1
and2
, sof 1 = f 2
by congruence. Thus the queryf ?a
should return exactly one match. However, two matches are found instead. This behavior is the same before and after the patch I previously made, and I believe its cause is unrelated to the e-matching algorithm itself.After some investigating, I determined that initially, there are four e-classes, which I will call c1, c2, f1, f2, which correspond to the expressions
1
,2
,f 1
, andf 2
, respectively. Since1
and2
are merged, the e-classesc1
andc2
are merged, sof1
andf2
should be merged as well. However, they are not merged, and so bothf1
andf2
are treated as e-classes we should search, erroneously adding another match. This can be verified by checking thatf1
andf2
do not have the same canonical e-class id after rebuilding.I don't yet fully understand the problem, but it seems that the rebuilding is not propagating to the parent e-classes. In particular, I'm a bit confused about the e-graph's
class_members
field, which purportedly maps e-classes to its children e-nodes. From my understanding (see egg paper sec. 3.1), maintaining the congruence invariant requires upward merging, which would mean storing the parent e-nodes for each e-class instead. But I could be misunderstanding it.