stedolan / ppx_stage

Staged metaprogramming in stock OCaml
MIT License
150 stars 7 forks source link

ppx_stage adds support for staged metaprogramming to OCaml, allowing type-safe generation, splicing and evaluation of bits of OCaml source code. ppx_stage is heavily inspired by MetaOCaml, and can run many of the same programs (albeit with a slightly different syntax for staging). See test/strymonas.ml for a large example.

Install it with:

opam pin add ppx_stage git://github.com/stedolan/ppx_stage.git

After it's installed, you can load it into a standard OCaml toplevel:

#use "topfind";;
#require "ppx_stage.ppx";;

ppx_stage doesn't need a special compiler switch - it's compatible with any recent version of OCaml.

Once it's loaded, you'll be able to use the [%code ...] syntax to construct program fragments:

# let greeting = [%code print_string "Hello!\n"];;
val greeting : unit Ppx_stage.code = {Ppx_stage.compute = <fun>; source = <fun>}

The default output's pretty ugly, so install a better printer before going any further:

# #install_printer Ppx_stage.print;;
# let greeting = [%code print_string "Hello!\n"];;
val greeting : unit Ppx_stage.code = print_string "Hello!"

Note that it hasn't printed Hello! yet: greeting is a value of type unit Ppx_stage.code, which means it's the source code of a program which, when run, returns unit. We can run the program with Ppx_stage.run, and then we'll see the message:

# Ppx_stage.run greeting;;
Hello!
- : unit = ()

You can think of [%code ...] as being like fun () -> ..., producing a value which represents a block of code that has not yet been run. There are two important differences: first, we can access the source code of programs built with [%code ...] (using Ppx_stage.print to pretty-print it, or using Ppx_stage.to_parsetree to get the raw syntax tree). Secondly, we can splice multiple blocks of code together using escapes.

Splicing and escapes

Inside [%code ...] blocks, the syntax [%e ...] lets you splice in a piece of code into the middle of a template. For example:

# let two = [%code 2];;
val two : int Ppx_stage.code = 2
# let three = [%code 1 + [%e two]];;
val three : int Ppx_stage.code = 1 + 2

The escapes [%e ...] (sometimes known as "antiquotations") take a value of type 'a code, and splice it into a surrounding piece of code as an 'a.

The expression in the [%e ...] is run once, while the program is being generated, and doesn't form part of the generated code. This will make more sense with an example:

# let random_number_code () = Ppx_stage.Lift.int (Random.int 100);;

Here, random_number_code () produces source code for a random number, by using Ppx_stage.Lift.int to turn an int into an int code (Unlike MetaOCaml, ppx_stage does not automatically let 'a be turned into 'a code - values from the host program have to be explicitly lifted if they are used in the generated program).

We can use escapes to splice random_number_code () into a bigger program:

# let p = [%code 2 * [%e random_number_code ()]];;
val p : int Ppx_stage.code = 2 * 17

When [%code 2 * [%e random_number_code ()]] was evaluated, OCaml first evaluated random_number_code () (which this time returned [%code 17]), and then spliced that into [%code 2 * [%e ...]] giving [%code 2 * 17]. The call to random_number_code () isn't part of p: the source code of p is the program 2 * 17, and every time it runs it produces the same value:

# Ppx_stage.run p;;
- : int = 34
# Ppx_stage.run p;;
- : int = 34

We can generate a new program by rerunning random_number_code ():

# let q = [%code 2 * [%e random_number_code ()]];;
val q : int Ppx_stage.code = 2 * 85

This second call to random_number_code () returned a different value, but again q returns the same value every time it is run:

# Ppx_stage.run q;;
- : int = 170
# Ppx_stage.run q;;
- : int = 170

Binding

The scopes of variables in stage programs extend into nested escapes and [%code ...] blocks, which is surprisingly useful. Below is a staged version of List.map:

let map f = [%code
  let rec go = function
    | [] -> []
    | x :: xs -> [%e f [%code x]] :: go xs in
  go]

The tricky part here is f [%code x]: the x being passed to f refers to the x that was bound by x :: xs in the enclosing [%code ...] block.

The type of this function is worth a close look:

val map :
  ('a Ppx_stage.code -> 'b Ppx_stage.code) ->
  ('a list -> 'b list) Ppx_stage.code = <fun>

map takes a function from 'a code to 'b code, and returns code for a function from 'a list to 'b list. So, the f that we pass to map is given code for the current element of the list, and returns code for its replacement. We can write such an f using splicing:

let plus1 x = [%code [%e x] + 1]

Then, map plus1 splices plus1 into go, giving this code:

# map plus1;;
- : (int list -> int list) Ppx_stage.code =
let rec go = function | [] -> [] | x::xs -> (x + 1) :: (go xs)  in go

This map isn't the standard List.map function - instead, it's a template that produces a specialised map function, when given the code for processing each element.

This style can be used to write efficient libraries that generate optimised code. For a detailed example, read the paper Stream Fusion, to Completeness (by Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos and Yannis Smaragdakis), or play with their MetaOCaml library or a port of it to ppx_stage.

Code written in this style tends to involve writing many functions like plus1, which map code for a value to code for a result. To make them a bit less syntactically noisy, ppx_stage supports fun%staged as syntactic sugar for the combination of [%code ...] and [%e ...], allowing:

map (fun%staged x -> x + 1)

Typing and hygiene

So far, most of what's been described here could be accomplished with horrible string concatenation trickery. Two aspects of ppx_stage require a bit more, though: typing and hygiene.

First, all [%code ...] and [%e ...] blocks are statically typed: a value of type 'a code is the source code of a program producing 'a, and if the original program passes the OCaml typechecker, then it cannot generate ill-typed code. Instead of modifying the typechecker, this is accomplished by translating each [%code ...] block into a pair of expressions: the first is the body (the ...), unmodified except for [%e ...] escapes, and the second is code that produces a syntax tree for the body given syntax trees for the escapes. This translation is untyped, but by ensuring both half of the pair represent the same code, we know that if the first passes the OCaml typechecker then the second generates type-correct code.

The second issue is hygiene: under certain circumstances, ppx_stage may need to rename variables to prevent undesired shadowing. For instance, suppose we have a function that produces constant functions (that is, functions that ignore their argument):

let const v = [%code fun x -> [%e v]]

Now suppose we use this as [%code fun x -> [%e const [%code x]]]. If we were to just splice strings together, we might end up with:

fun x -> fun x -> x

which is wrong: the variable x at the end should refer to the outer binder, not the one introduced by const. Instead, ppx_stage generates this code:

fun x  -> let x''1 = x  in fun x  -> x''1

which introduces an alias x''1 for x so that it can be referred to even when x is shadowed.

Polymorphism

Because of how ppx_stage implements staging, some of the usual difficulties in staging polymorphic functions are avoided. Code like this works as expected:

# [%code let id x = x in (id 1, id "foo")];;
- : (int * string) Ppx_stage.code =
let id x = x  in ((id 1), (id "foo"))

There are two subtle restrictions on polymorphism. First, variables bound in staged programs have monomorphic types in nested [%code ...] expressions. For instance, this code won't compile:

# fun f -> [%code let id x = x in [%e f [%code (id 1, id "foo")]]];;
Error: This expression has type string but an expression was expected of type
         int

The function id is polymorphic, but the use of id in the nested [%code ...] block must be monomorphic.

Second, since splices are translated to applications, code generated from splices is subject to the (relaxed) value restriction. For example, the following code is given a non-polymorphic type:

# [%code fun x -> [%e [%code x]]];;
- : ('_a -> '_a) Ppx_stage.code = fun x  -> x

Modules and functors

(Support for staged modules and functors is even more experimental than the rest of ppx_stage. Expect breaking changes.)

By default, modules definitions are not in scope in [%code ...] blocks and do not appear in staged programs, so the following gives a type error:

module M : sig
  val x : int
end = struct
  let x = 42
end
let foo y = [%code M.x + [%e y]] (* Error: M.x not in scope *)

To make the definition of M.x visible in the staged program, we need to annotate the module binding, its signature and its definition:

module%code M : sig[@code]
  val x : int
end = struct[@code]
  let x = 42
end
let foo y = [%code M.x + [%e y]] (* works *)

When we print e.g. foo [%code 10], the output will include any staged module definitions that the result depends on:

module M'1 = struct let x = 42  end
let _ = M'1.x + 10 

Here, the module M has been renamed to M'1. In this example, the renaming is not terribly helpful, but in general the renaming is necessary to prevent multiple staged modules with the same name being confused.

For programs using modules to group and namespace related definitions, staging just means adding %code to module bindings and [@code] to structures (and signatures, if present). To write more advanced staged programs (e.g. using functors), you need to understand what the separate annotations do.

A staged module is a module annotated with [@code], which instructs ppx_stage to record the source code as well as the contents of the module, just like [%code ...] does for expressions. Staged modules have staged signatures, which are also written with [@code], so we can write:

module Staged : sig[@code]
  val x : int
end = struct[@code]
  let x = 42
end

We get a type error if only one of the [@code] annotations is present: staged signatures are different from their unstaged counterparts.

A staged module is not automatically made available to staged expressions using [%code ...]. Instead, it must be explicitly exported using the syntax module%code:

module%code M = Staged

Separating the construction and export of a staged module like this is important for writing code using functors, when a functor might export a staged module passed as a parameter. For instance:

module F (A : sig[@code] val x : int end) = struct
  module%code A = A
  let bigger = [%code A.x + 1]
end
module M = F (struct[@code] let x = 42 end)

Note that it is necessary to explicitly export A using module%code: functor arguments are not automatically exported, even if they have staged signatures.

Printing M.bigger produces the following output:

module A'1 = struct let x = 42  end
let _ = A'1.x + 1

As well as definitions, functor applications can be staged with [@code]. For instance, this functor accepts a staged module of signature Map.OrderedType and builds a map with that key type:

module MkMap (Key : Map.OrderedType[@code]) = struct
  module%code Key = Key
  module%code KMap = Map.Make (Key) [@code]
  let singleton k v = [%code KMap.singleton [%e k] [%e v]]
end
module StringMap =
  MkMap (struct[@code] type t = string let compare = compare end)

Printing StringMap.singleton [%code "hello"] [%code 5] gives:

module Key'1 = struct type t = string
                      let compare = compare  end
module KMap'1 = (Map.Make)(Key'1)
let _ = KMap'1.singleton "hello" 5