ocaml-ppx / ppx_deriving

Type-driven code generation for OCaml
MIT License
466 stars 89 forks source link

Deriving show for GADTs #290

Open keigoi opened 1 month ago

keigoi commented 1 month ago

This PR introduces [@@deriving show] for a subset of GADTs.

Code might look a bit ugly for now, and there are some rooms to be improved. So, please discuss, suggest, and/or comment on this PR. Thanks!

Summary

Before thinking about the implementation, let me re-cap the "calling convention" for derivers on certain types (which I believe). For polymorphic types like 'a foo, Ppx_deriving generates functions like:

let pp_foo : 'a. (formatter -> 'a -> unit) -> formatter -> 'a foo -> unit = fun poly_a -> ...

See that here is an extra parameter poly_a : formatter -> 'a -> unit for pretty-printing its type parameter 'a, and let's call such a parameter (above poly_a) a sub-instance.

On calling pretty printers for such (non-GADT) types, the caller will supply sub-instances for the argument types. For example, the deriver will generate (pp_foo pp_int) for printing int foo where pp_int : formatter -> int -> unit.

Sub-instances are often unused for GADTs

The observation is that, such sub-instances like poly_a above are usually never invoked for certain GADTs. Instead, type parameters are eventually refined to a ground type and pretty-printed based on that (concrete) type without the need for supplying the sub-instances.

This PR will exploit the fact. Here, we would have a few options for that:

  1. Completely eliminate the parameter from the signature. This seems not working in practice, as the calling converntion for the polymorphic types explained above unconditionally gives extra argument for sub-instances, breaking the compatibility. This is even fatal for deriving instances for recursive types. Instead, I propose another strategy:
  2. Give more "liberal" signature for such functions, like (here, the sub-instance has completely unrelated type 'b):
let pp_foo : 'a 'b. (formatter -> 'b -> unit) -> formatter -> 'a foo -> unit = fun _poly_a -> ...

The current PR follows this strategy. By relying on this signature generation policy, a larger class of GADTs will (hopefully) be handled without introducing compatibility issues (see below).

Here are what this PR extends Ppx_deriving (show), but I suppose the same strategy would be effective for other derivers as well:

1. Generating "degenerate" pretty-printers (instances)

One of obstacles for handling GADTs in the current Ppx_deriving (show) is that it cannot generate valid code for type variables in the refined types and for existentials. For example, the Pair below has type variables 'x and 'y for which we have no way to generate pretty-printers (instances), and the deriver will just put nonexistent functions poly_x and poly_y which leads to a compile-time error.

type 'a baz = Pair : 'x baz * 'y baz -> ('x * 'y) baz | ...

Instead of that, based on the above "eventually grounded" assumption, this PR will generate a never-called, degenerate pretty-printer for both 'x and 'y:

(fun fmt -> (_:[`this_type_is_refined]) -> failwith "impossible")

which raises an exception, but is actually never called as it is guarded by the type [this_type_is_refined] polymorphic variant type which prevents accidental call. The case for Pair in the printer pp_baz will be:

| Pair(a,b) -> 
  fprintf "Pair(";
  fprintf "%a" (pp_baz (fun fmt -> (_:[`this_type_is_refined]) -> failwith "impossible")) a;
  fprintf ", "
  fprintf "%a" (pp_baz (fun fmt -> (_:[`this_type_is_refined]) -> failwith "impossible")) b;
  fprintf ")"

This typechecks thanks to the "more liberal" signature above for GADTs.

2. New option refined_params for declaring the "refining" type parameter

As I described above, we need some way to specify which parameters are "refining" at first. For this purpose, the current code provides the option refined_params=[ <list of parameter positions> ] which can give the positions for refined type parameters to the deriver. For example,

type ('a, 'b) bar
[@@deriving show {refined_params=[0]}]

will produce

val pp_bar : 'a 'b 'c. (formatter -> 'c -> unit) -> (formatter -> 'b -> unit) -> formatter -> ('a, 'b) bar -> unit

See that the first (0-th) argument of pp_bar is parametric over type 'c which is unrelated to the rest of signature.

(As @gasche mentioned in his comment, it would be more preferable if we could just put _ in the type declaration to specify this. However, It is not compatible with the current behaviour which just eliminates the sub-instance argument for the corresponding type parameter e.g. _ foo will lead to pp_foo : formatter -> 'a foo -> unit without argument poly_a.)

3. Semi-automatic inference for refined parameters

In the implementations (.ml), refined parameters are usually apparent from the return type of constructor types. For this, I implemented a quick code to decide which types are actually defined. However, I admit that this convention to use type variable names to specify refined parameters is rather convoluted...

See: https://github.com/keigoi/ppx_deriving/blob/d2ccd5fc983e3fb5f19fa4a2408ae923de8c06a8/src_plugins/show/ppx_deriving_show.ml#L247-L283

4. Wrapping functions with locally abstract type signatures (newtypes)

Apparently we need this to handle pattern-maching on GADTs.

See: https://github.com/keigoi/ppx_deriving/blob/e18c11e4e56349d52dcd0acb5e203d6f555ba262/src_plugins/show/ppx_deriving_show.ml#L394-L399

(I agree that this is better to be done unconditionally (and to replace the current signature), as @gasche had mentioned.)


Related: https://github.com/ocaml-ppx/ppx_deriving/issues/7

gasche commented 1 month ago

This looks like impressive work, but it's also not so clear what it does right now from the code. Some early comments:

keigoi commented 1 month ago

Hello @gasche, thank you for your prompt comments! I filled up the summary now, and any further comments/edits will be highly appreciated. Thanks!