janestreet / base

Standard library for OCaml
MIT License
848 stars 124 forks source link

Map.update changes the "physical" key in certain circumstances #147

Closed leflings closed 1 year ago

leflings commented 1 year ago

Base version: 0.15.1

Reproduced here: https://github.com/leflings/ocaml_base_map_key_issue (dune exec -- test/map_key_bug.exe -e)

The following might all be intended behaviour, in which case you can disregard this issue, but the behaviour atleast surprised me.

I'm new to OCaml and was recently trying to solve a task where I had to count how many times unique words occur in a string.

I wanted to do this in a case insenstive manner and ended up utilizing String.Caseless and Map for this.

Given the following function

open Base

(* string list -> (string * int) list *)
let count_words words =
  let map = Map.empty (module String.Caseless) in
  List.fold words ~init:map ~f:(fun acc e ->
    Map.update acc e ~f:(function
      | None -> 1
      | Some n -> n + 1))
  |> Map.to_alist ~key_order:`Increasing
;;

My expectation is, that the first occurence of a word, would stick around as the key. But in certain cases, it will get swapped during an invocation of Map.update

Given the above function, I've written the following tests (with Alcotest)

let () =
  let test case_name input expected =
    Alcotest.test_case case_name `Quick (fun () ->
      Alcotest.(check (list (pair string int))) "equal lists" expected (count_words input))
  in
  Alcotest.run
    "Tests"
    [ ( "map key test"
      , [ test "simple" [ "Foo" ] [ "Foo", 1 ]
        ; test "coalescing different cases" [ "Foo"; "foo" ] [ "Foo", 2 ]
        ; test "keys in increasing order" [ "Foo"; "bar" ] [ "bar", 1; "Foo", 1 ]
        ; test "preserves keys" [ "FOO"; "bar"; "foo" ] [ "bar", 1; "FOO", 2 ]
        ; test
            "second occurence always changes key"
            [ "FOO"; "bar"; "foo"; "BAR" ]
            [ "BAR", 2; "foo", 2 ]
        ] )
    ]
;;

The 4th and 5th tests fails with the following output (abbreviated for clarity):

Testing `Tests'.

...FF

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ [FAIL]        map key test          3   preserves keys.                                                                              │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
ASSERT equal lists
FAIL equal lists

   Expected: `[("bar", 1); ("FOO", 2)]'
   Received: `[("bar", 1); ("foo", 2)]'

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ [FAIL]        map key test          4   second occurence always changes key.                                                         │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
ASSERT equal lists
FAIL equal lists

   Expected: `[("BAR", 2); ("foo", 2)]'
   Received: `[("bar", 2); ("foo", 2)]'
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
2 failures! in 0.005s. 5 tests run.

Inspecting https://github.com/janestreet/base/blob/v0.15/src/map.ml#L714-L748 tells me that this behaviour only happens when updating a key that is in a Node and not in a Leaf (as we see in last test case). The offending line seems to be 738, where the node is reconstructed with the key given as an argument, rather than they existing key v found in the Node.

It seems Map.change would exhibit the same behaviour.

ceastlund commented 1 year ago

I don't believe Map makes any guarantee about which key is kept if two equivalent keys are supplied. So I the current behavior is correct, and changing it to keep the prior key would also be correct. I don't see any particular need to change it -- if "BAR" and "bar" are equivalent as keys, it shouldn't really matter which one the map keeps.