verse-lab / ego

EGraphs in OCaml
https://verse-lab.github.io/ego/ego/index.html
GNU General Public License v3.0
60 stars 6 forks source link

Issue with merging/rebuilding? #8

Open tmoux opened 9 months ago

tmoux commented 9 months ago

I believe I have found a bug related to egraph merging/rebuilding. It occurs in this test in basic.ml:

let%test "test" =
  let g = EGraph.init () in
  let c1 = EGraph.add_sexp g [%s 1] in
  let c2 = EGraph.add_sexp g [%s 2] in
  let f1 = EGraph.add_sexp g [%s (f 1)] in
  let f2 = EGraph.add_sexp g [%s (f 2)] in
  EGraph.merge g c1 c2;
  EGraph.rebuild g;
  let query = Query.of_sexp [%s f "?a"] in
  let matches = EGraph.ematch g (EGraph.eclasses g) query |> Iter.length in
  Alcotest.(check int) "check num matches" 1 matches

Here, we merge 1 and 2, so f 1 = f 2 by congruence. Thus the query f ?a should return exactly one match. However, two matches are found instead. This behavior is the same before and after the patch I previously made, and I believe its cause is unrelated to the e-matching algorithm itself.

After some investigating, I determined that initially, there are four e-classes, which I will call c1, c2, f1, f2, which correspond to the expressions 1, 2, f 1, and f 2, respectively. Since 1 and 2 are merged, the e-classes c1 and c2 are merged, so f1 and f2 should be merged as well. However, they are not merged, and so both f1 and f2 are treated as e-classes we should search, erroneously adding another match. This can be verified by checking that f1 and f2 do not have the same canonical e-class id after rebuilding.

I don't yet fully understand the problem, but it seems that the rebuilding is not propagating to the parent e-classes. In particular, I'm a bit confused about the e-graph's class_members field, which purportedly maps e-classes to its children e-nodes. From my understanding (see egg paper sec. 3.1), maintaining the congruence invariant requires upward merging, which would mean storing the parent e-nodes for each e-class instead. But I could be misunderstanding it.

kiranandcode commented 9 months ago

Oh, thanks for the bug report and the detailed debugging log!! Yepp, this does seem like a bug.

w.r.t the class_members field, you're right about the natural encoding being storing the parent enodes for each eclass. It's been a while since I looked at the code, I don't quite recall the encoding ego uses, but I remember that some of the deviations from the paper were motivated by looking at egg's source code.

Hm... this needs some more experimentation.

tmoux commented 9 months ago

I tried to determine whether the generic version had the same bug. This analogous unit test in test_prop.ml passes:

(fun () -> let graph = EGraph.init () in
let c1 = EGraph.add_node graph (L.of_sexp [%s x]) in
let c2 = EGraph.add_node graph (L.of_sexp [%s y]) in
let _ = EGraph.add_node graph (L.of_sexp [%s not x]) in
let _ = EGraph.add_node graph (L.of_sexp [%s not y]) in
EGraph.merge graph c1 c2;
EGraph.rebuild graph;
let q = qf [%s not "?a"] in
let matches = EGraph.find_matches (EGraph.freeze graph) q |> Iter.length in
Alcotest.(check int) "1 match" 1 matches)

which seems to indicate that the bug may be limited to the basic version. From my cursory look, the implementation of rebuilding in the generic version is slightly different, and that might explain why the two implementations have different behavior in this case.

tmoux commented 9 months ago

It seems generic.ml stores parent enodes in the eclass similar to egg: https://github.com/verse-lab/ego/blob/1d9a39a9064fddccaaee5282cc8a695bbed80955/lib/generic.ml#L24-L29

https://github.com/verse-lab/ego/blob/1d9a39a9064fddccaaee5282cc8a695bbed80955/lib/generic.ml#L308-L311

kiranandcode commented 9 months ago

oooh, yepp, that could be it!! Good find. I think I wrote the initial basic version from scratch, and then iterated on the design to improve the performance. For the generic one, I ported over the design from egg in rust, but didn't backport those changes to the basic one. Hence the bug.

tmoux commented 9 months ago

Gotcha, the generic version is a more well tested than the basic one, so I can see how the bug might've slipped through. I think I'll try to implement a fix when I have time.