ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.62k stars 401 forks source link

Idea/Proposal: Promote ppx artifacts. #1002

Closed rgrinberg closed 6 years ago

rgrinberg commented 6 years ago

This idea is basically inspired by the support for checking in generated ppx. On one hand, there's already one mechanism for doing this which depends on driver. Unfortunately, this mechanism has 2 flaws:

I propose that we allow users to promote ppx processed files to .ml code and check it in the source. Compilation in release mode would then do no ppx preprocessing and simply use these generated .ml files. Development mode would work as normal except with an extra diff check to make sure that generated sources are up to date.

cc'ing @jordwalke who has expressed interest in this feature before and @avsm which is tackling ppx related dependency issues.

This proposal is not yet concrete - I haven't made any concrete suggestions on how this would look like to the user. First, I'd like to make sure that people agree that this is a desireable feature.

ghost commented 6 years ago

Regarding the mechanism used by ppx_type_conv, it could actually be integrated with ocamlformat. Then at the same time as formatting your file the generated contents would be updated. That would be nice.

Regarding this feature, one important consideration is that if you release the preprocessed code you basically cannot fix a bug in a ppx.

rgrinberg commented 6 years ago

Regarding the mechanism used by ppx_type_conv, it could actually be integrated with ocamlformat. Then at the same time as formatting your file the generated contents would be updated. That would be nice.

Yeah that would be nice. Although this is just a perf benefit, right? Linters should be composable just by running them in sequences.

Regarding this feature, one important consideration is that if you release the preprocessed code you basically cannot fix a bug in a ppx.

Sure, but this is also an issue with the @@@deriving_inline as well as far as I can tell.

ghost commented 6 years ago

Yeah that would be nice. Although this is just a perf benefit, right? Linters should be composable just by running them in sequences.

Not really, it's more an improvement of the user experience. In one case your build system tells you there is a corrected file and you press a button to promote the correction, while the method I suggest the generated contents would automatically be inserted whenever you press the save button in your editor.

Sure, but this is also an issue with the @@@deriving_inline as well as far as I can tell.

Indeed, but @deriving is the most useful kind of rewriter.

Another issue I just thought of: distributing the generated code is incompatible with ppx rewriters such as ppx_import, since the point is precisely to be independent of the original type definition.

rgrinberg commented 6 years ago

Another issue I just thought of: distributing the generated code is incompatible with ppx rewriters such as ppx_import, since the point is precisely to be independent of the original type definition.

Would you mind clarifying this? My mental model of ppx_import is that it basically copies the type definitions at build time from various cmi's. I don't see why this should break with promotion. Yes, it will mean that if you're relying on something being imported from some external library things won't work as expected. But I think that's fine.

jordwalke commented 6 years ago

There's a couple of thoughts:

  1. You probably don't want to use ocamlfmt to generate the published code. You want all the error locations to refer to the original sources before they were ppx'd so that the line numbers match up. Also, imagine if the input file was Reason syntax.
  2. You can then remove many ppx's from your package dependencies which makes the whole ecosystem scale and avoids superficial package version conflicts.
  3. I believe you could marshal the binary AST into a .ml file and the locations inside of it would refer to the original locations in the original sources - which should also be included with the published package, and all the error messages should link to the right files/lines.

When discussing this tool I've been referring to it as OMP+. The challenge is in publishing binary marshallings of ASTs that will work in every compiler. I could imagine a tool that would - only at publish time - take your sources, apply any preprocessing, then convert it to each version of the compiler's AST using OMP, and then marshaling it to that version of the compiler's object marshalling format.

Not only would this remove the need to run ppx, but also would cut out OMP from the build times as well (which can add up if jumping from version 4.04 -> 4.05 -> 4.06 -> 4.07).

ghost commented 6 years ago

@rgrinberg, AFAIU the main problem ppx_import solves is that you can add a [@@deriving] attribute to a type coming from an external library and stay compatible when this external type change. If you distribute the code generated by ppx_import, then you loose this ability to stay compatible when the type change.

@jordwalke I understand the motivation, but IMO distributing binary artifacts creates more problems that it solves. For instance one serious issue is that it completely forbids debugging build failures with printf.

I agree that ppx still has issues, but there is not much left to do to fix the last remaining ones. ppx_view should solve the last compatibility problems.

Regarding build times, cloud builds are the way to go and we are planning to add support for cloud builds in dune at some point.

rgrinberg commented 6 years ago

then you loose this ability to stay compatible when the type change.

Yeah, I think that is an acceptable and fairly obvious trade off. Btw, I agree that this feature isn't really recommended for the average user. But I kind of dislike how we special case ppx from other code generators/code preprocessors. It should really be possible to promote the artifacts from any of these tools (ocamllex, ppx, cppo, reason, etc) without any extra hassle. I agree the actual use cases for checking in generated ppx is small, but it exists nonetheless.

ghost commented 6 years ago

Well, it's a more complicated problem. Here you need not only to promote the artifacts, you need to promote the artifacts for several versions of the compiler.

ghost commented 6 years ago

Yeah, I think that is an acceptable and fairly obvious trade off.

If you are ready to make this trade-off I don't understand why you'd use ppx_import at all.

rgrinberg commented 6 years ago

You can still use it to minimize boilerplate in cases where include on a module without an .mli isn't quite convenient. Anyways, even if it didn't work with ppx_import at all it would still be useful. ppx_import is quite a rarely used ppx.

jordwalke commented 6 years ago

@jordwalke I understand the motivation, but IMO distributing binary artifacts creates more problems that it solves. For instance one serious issue is that it completely forbids debugging build failures with printf.

With what I have in mind, you would still be able to debug via printf. I'm thinking about "publish time ppx" or "omp+", as an optimization that removes build-time dependencies at publish time and turns them into publish-time dependencies which in turn:

It's always great to improve build performance, but the real win to me is the reduction in package ecosystem complexity. Right now, you can't even have a package that uses Reason v2 with another package that uses Reason v3. There's no need for that conflict - it's totally superficial. We can add opam support for properly managing that, but in the mean time, publish-time-ppx will work soon, with any version of opam.

Here's my full understanding of the challenges to this approach, did I miss anything?

It sounds like you also have some other ideas about potential pitfalls of binary processed artifacts at publish time.

yminsky commented 6 years ago

A couple of observations.

The issue about wanting to use multiple versions of Reason in different packages seems more substantial, but this feels like a hack to achieve that goal. My view is that time would be better spent with a more principled solution.

rgrinberg commented 6 years ago

It doesn't seem terribly important. Yes, it would make compilation faster (though I suspect not by all that much. Are there any performance numbers?), but there are other projects cooking that will have far larger implications for performance, like build artifact caching.

I don't predict this feature to be widely used, but I think the importance here is underestimated. Keep in mind that base basically relies on the exact same mechanism specialized to ppx_type_conv and requires editor support. Doesn't it seem a bit weird that one can use [@@deriving] but not [%sexp] in base? Sure, deriving_inline can probably be generalized to work for all ppx as well, but this approach will work for all preprocessors and will require no maintenance. So while I don't expect base to switch to this mechanism if it was available, but I do see this as a strict improvement over what we currently have to achieve the same goals.

yminsky commented 6 years ago

Fair enough. But the deriving tricks done in Base were a necessary evil for bootstrapping purposes, and I'm not sure we want to encourage yet more such trickery. And it seems to me like the most pressing problems that @jordwalke wants to solved (the inability to use Reasonv2 and Reasonv3 together) should not in general be solved in this way.

It's not that I don't see some value in this kind of feature; just that it seems like it might be a rather sharp-edged feature, and also seems to be of relatively low value.

ghost commented 6 years ago

To me, this discussion highlights two points:

Up to know we have been trying to solve these problems outside of the compiler, and the fact that we end up contemplating such complex solutions is a clear indication that is not good enough. At this point we have learned a lot about ppx rewriters: how they work, how they are used and what is missing. The right way forward is to give an official answer to all the open questions that have so far been answered in various independent ways by the community in the compiler itself, so that we have a solid basis to move forward.

Finally just one note regarding this kind of features: Dune is not here to solve the compiler problems. Compiler problems should be solved in the compiler. Dune is here to abstract all the low-level details and provide a smooth and consistent experience to developpers.

ghost commented 6 years ago

Looking at this proposal again, it could even be implemented entirely outside of Dune: @jordwalke you could automatically edit the jbuild/dune and replace the (preprocess ...) fields by a custom preprocessors that will do exactly what you want. There is no need for specific support in Dune.

Given that and that this proposal still looks like way too much work compared to the benefit to me, I'm closing this PR.