thizanne / ocaml-hamt

Implementation of the Hash Array Mapped Trie data structure in OCaml
MIT License
13 stars 0 forks source link

Some keys cannot be found after `union` #41

Open MartyO256 opened 1 year ago

MartyO256 commented 1 year ago

Expected Behavior

If HAMT m is built by taking the union of two HAMTs m1 and m2, then we should be able to find a binding in m for each key in m1 and m2.

For instance, the following two functions should be equivalent:

let add_all_by_add m1 m2 =
  Hamt.String.fold
    (fun key entry accumulator ->
      Hamt.String.add key entry accumulator)
    m2 m1

let add_all_by_union m1 m2 =
  Hamt.String.union
    (fun _identifier _e1 e2 -> e2)
    m1 m2

Current Behavior

Some entries cannot be found using Hamt.String.find_opt or Hamt.String.mem when we use add_all_by_union. The bindings are correctly present when using Hamt.String.bindings.

This suggests that HAMTs obtained using union are not necessarily built in the way find_opt expects.

Steps to Reproduce

In the following example, we're building the HAMT with bindings [("tp", ()); ("suc", ()); ("something", ()); ("expCtx", ())] as the union of the HAMTs with bindings [("tp", ()); ("suc", ())] and [("something", ()); ("expCtx", ())] respectively. While Hamt.String.bindings hamt = [("something", ()); ("suc", ()); ("tp", ()); ("expCtx", ())] does contain all the bindings, none of the keys can be found with Hamt.String.mem.

let of_list keys = Hamt.String.of_seq (List.to_seq keys)

let add_all_by_union' m2 m1 =
  Hamt.String.union
    (fun _identifier _e1 e2 (* Never called in this example *) -> e2)
    m1 m2

let hamt =
  Hamt.String.empty
  |> add_all_by_union' (of_list [ ("tp", ()); ("suc", ()) ])
  |> add_all_by_union' (of_list [ ("something", ()); ("expCtx", ()) ])

let () =
  let hamt_as_list = List.of_seq (Hamt.String.to_seq hamt) in
  let mem_assoc =
    List.map (fun (key, ()) -> (key, Hamt.String.mem key hamt)) hamt_as_list
  in
  Format.fprintf Format.std_formatter "[@[<v 2>@,%a@]@,]@."
    (Format.pp_print_list
       ~pp_sep:(fun ppf () -> Format.fprintf ppf ";@ ")
       (fun ppf (key, hit) ->
         Format.fprintf ppf "@[(\"%s\",@ %s)@]" key
           (if hit then "true" else "false")))
    mem_assoc

This prints out:

[
  ("something", false);
  ("suc", false);
  ("tp", false);
  ("expCtx", false)
]

Environment

OCaml 4.14.1 Hamt revision 2d5e5364a12e7b9bb61ad5dfaf6062353d942e26

thizanne commented 1 year ago

Thanks for the nice, detailed and reproducible report! Unfortunately there is very few (people.time) allocated to working on hamt at the time, so I'm not sure when we'll be able to fix it. I've been looking briefly at the code but wasn't able to find the obvious mistake.

If someone wants to reproduce and propose a patch (that I'll happily accept): all four items on the add_all_by_union' are needed to trigger the issue. Which probably points to a bug on merging BitmapIndexedNodes.