ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.19k stars 1.06k forks source link

Matching: refactor to clarify the argument-passing structure #13084

Closed gasche closed 1 month ago

gasche commented 1 month ago

This refactoring PR is born out of a discussion with @ncik-roberts in https://github.com/ocaml/ocaml/pull/12555#discussion_r1552437250 . "Arguments" in pattern-matching compilation are expressions that access parts of the scrutinee. They can contain complex expressions, in particular field projects of the form x.f, but never x.f.g -- the projection necessarily operates on a variable, or sometimes a tuple of variables in weird corner cases of do_for_multiple_match. There is a specific moment in pattern-matching compilation that turns complex expressions into variables, and then the rest of the pipeline assumes that it gets a variable, and this is important for correctness, but not obvious to see in the code at all.

The present PR uses finer-grained types to statically encode this structure of arguments, continuing on a tradition started with @trefis a few years ago to clarify the pattern-matching compiler. There is no change to the behavior, and the diff consists almost entirely of very minor simplifications, but it becomes much easier to reason about this transition from arbitrary expressions to simple arguments in the code -- and I will rely on this property in a follow-up PR to fix The Pattern-Matching Bug.

My favorite parts of the diff are the following:

-                args =
-                  ( match args with
-                  | _ :: r -> r
-                  | _ -> assert false
-                  );
+                args = args.rest;
 and compile_match_simplified ~scopes repr partial ctx
-    (m : Simple.clause pattern_matching) =
-  match m with
-  | { cases = []; args = [] } -> comp_exit ctx m
-  | { args = ((Lvar v as arg), str) :: argl } ->
-      bind_match_arg str v arg (
-        let args = (arg, Alias) :: argl in
-        let m = { m with args } in
-        let first_match, rem = split_and_precompile_simplified m in
-        combine_handlers ~scopes repr partial ctx first_match rem
-      )
-  | _ -> assert false
+    (m : (simple_args, Simple.clause) pattern_matching) =
+  let first_match, rem = split_and_precompile_simplified m in
+  combine_handlers ~scopes repr partial ctx first_match rem
 and do_compile_matching ~scopes repr partial ctx pmh =
   match pmh with
   | Pm pm -> (
-      let arg =
-        match pm.args with
-        | (first_arg, _) :: _ -> first_arg
-        | _ ->
-            (* We arrive in do_compile_matching from:
-               - compile_matching
-               - recursive call on PmVars
-               The first one explicitly checks that [args] is nonempty, the
-               second one is only generated when the inner pm first looks at
-               a variable (i.e. there is something to look at).
-            *)
-            assert false
-      in
+      let arg = arg_of_simple_arg (fst pm.args.first) in

The worst part of the diff is in do_for_multiple_match (a small hack due to toplevel_handler not being polymorphic over the argument structure), but it's basically fine.

A second commit also simplifies some functions that would take both a dedicated ~arg argument (representing the first argument known to be present) and an args vector that is now known to be simple; there is no need to track ~arg specifically as it can now be recovered from args.first.

(cc @trefis and @lthls who are on fire with pattern-matching refactoring reviewing)

gasche commented 1 month ago

Thanks both! Merging.