ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.61k stars 400 forks source link

Support multiple phases of PPX translation #1135

Closed zepalmer closed 2 years ago

zepalmer commented 6 years ago

Motivation

I'm working on a PPX extension that performs a pretty broad code transformation on a module. The preprocessor needs to know things like which variables are bound at certain points of the code; it traverses the AST to determine this. One precondition, however, is that any other extensions which may modify the bound variable set have already been processed.

I've been using dune for this project and have found it to be a hefty improvement over my old hackish Oasis build environment. However, it doesn't seem to be possible for me to enforce any kind of ordering on the PPX extensions. As the Dune documentation clearly states, Dune takes a set of PPX rewriter libraries, compiles them together into a single executable, runs that, and then compiles the result. This is sensible for most cases, but some projects (and rewriters) have somewhat more delicate requirements.

Dune already includes a staged_pps option, but this limits preprocessing to two particular phases and appears to be intended specifically for preprocessors like ppx_import (#193).

Proposal

I'd like to suggest an alternative which is simply an iterative version of the existing process. The syntax would go something like this:

(preprocess
  (phased_pps
    (ppx1 ppx2)
    (ppx3 ppx4)
    (ppx1)
  )
)

Here, each group of PPX libraries is compiled into a single executable. That executable is the run against the code; then, the next executable is constructed. In this example, we would build three executables: one which combines PPX libraries ppx1 and ppx2, one which combines PPX libraries ppx3 and ppx4, and one which just uses PPX library ppx1. The code would then be passed through each executable in turn to produce the final result. (In this case, one of the libraries -- such as ppx4 -- could even generate code which is expected to be processed using another library -- such as ppx1 -- without doing anything special.)

In my scenario, for instance, my PPX rewriter could be specified as a final phase, after common tools like ppx_deriving have generated bindings, so I can correctly ascertain which variables are bound in a given AST node and which are free.

Advantages

The phased_pps mode still uses the PPX library API that Dune currently uses. This preprocessing mode would allow for more sophisticated PPX rewriter configurations and would also support PPX rewriters with more delicate invariants. From an outsider's perspective, it seems like it wouldn't be too difficult to add to Dune -- it would just require iterating over the routine that handles the pps mode -- though I'm not sure what assumptions the existing code makes or how easily abstracted it is.

Disadvantages

Implementation is always work. Projects using phased_pps would compile slower due to the multiple PPX passes (though this is an inevitable cost of multi-pass preprocessing over the AST).

Summary

I'm proposing an alternative preprocessing mode which would do what pps currently does, but multiple times with different configurations. This would deliver a lot of the power of the traditional -ppx flag (as it would allow multiple, ordered transformation passes) but retain the advantages of Dune's current approach (as configuration can specify simultaneously executing some transformations, reducing the number of necessary tree walks).

Thank you!

ghost commented 6 years ago

We have been using ppx rewriters at Jane Street for several years, and one thing we found quite early is that composing ppx rewriters at the level of processes doesn't work well in practice. We developed ppxlib which implements a clean composition semantic for ppx rewriters. It seems that ppxlib is a better place to do what you are trying to do.

zepalmer commented 6 years ago

Thanks for the thoughts! I read over the documentation for ppxlib and, if I understand correctly, it provides a sort of featureful replacement for the PPX driver from Migrate_parsetree.Driver. I'm afraid I'm a bit ignorant of how all of these tools work together, but I looked over the interface for ppxlib and do not see anything that would allow me to enforce an ordering on transformations. The primary entry point appears to be Driver.register_transformation which, similar Migrate_parsetree.Driver.register, allows me to provide some transformation code which will be run on my behalf at some point in the future. Is there some mechanism that I'm missing that allows me to ensure that one transformation is run before another?

As far as I can tell, both drivers want to assume that all of my PPX transformations are commutative; is that correct? The trouble I'm having is that my transformation does not commute with the others in use in this project, so transformation ordering becomes important. If my read of the drivers is correct, it seems the only ways to ensure an order of transformation would be to (1) use multiple drivers, each of which performs a collection of commutative transformations, or (2) write my own driver which composes and executes the PPX transformations in the way that I see fit. I proposed the extension in this issue to take the first approach as I suspect that other projects might benefit from the ability to do so. If there's an option 3, I'm eager to hear about it. :)

Thanks again for the feedback. I learn a lot from searching the issues list of the OCaml GitHub repositories!

ghost commented 6 years ago

In ppxlib the idea is the following: each individual rewriter registers a function to expand a given extension point, and the rewriting is then done in one pass by the Context_free module which operates in a top-down manner: when an extension point is found, it is expanded with the corresponding expander, then the result is rewritten recursively. In particular this allows a ppx rewriter to produce extension points that will be rewritten by another rewriter, without the user having to specify these rewriters in a particular order. I wrote a blog post about this. I imagine that whatever you are doing could be done in this pass as well. However, this will of course not work in combination with ppx rewriters that don't use ppxlib, which might be an issue.

One thing you could do is modify either ppxlib or ocaml-migrate-parsetree to allow registering passes that are guaranteed to be executed before or after all other ppx rewriters. If you want to do it in a way that will allow your extension to be used in conjunction with as many other ppx rewriters as possible, you should modify ocaml-migrate-parsetree.

zepalmer commented 6 years ago

Thanks for the clarification! Top-down rewriting and ppxlib make sense for commutative extensions. Configurations which require explicit ordering of mostly commutative items are problematic in that they are misleading and brittle and, for most basic uses of PPX extensions, I'm happy to see things working the way they are.

It sounds to me that you are suggesting what I call option 2 above: writing my own driver which explicitly performs the PPX transformations in the fashion that I need. There are some downsides to this approach, though.

This is why I was hoping I could make use of the code that Dune already has for building and executing PPX extensions. If Dune included the above stanza, it would keep my configuration in one place, make it easy to add or remove extensions, and would work for complex preprocessing tools other than my own. That said, I understand if this isn't a development priority. :)

ghost commented 6 years ago

I'm suggesting to submit a PR to ocaml-migrate-parsetree to allow registering an extension that will be guaranteed to be executed before all other ones.

It is strictly better to do this change in the ocaml-migrate-parsetree driver rather than in dune, for the following reasons:

Additionally, ocaml-migrate-parsetree is now the common denominator of all modern ppx rewriters and the ones that don't use it yet cannot be used with dune. So there really is nothing to gain by doing this in dune rather than in ocaml-migrate-parsetree.

rgrinberg commented 2 years ago

Jeremie's argument was convincing here.