Reason V4 [Stack 1/n #2605] [Allow multiple versions of Reason]

jordwalke commented 4 years ago

Allow multiple versions of Reason Syntax inside of `refmt`

This is diff 1 in a stack of github PRs: Only examine the most recent commit in this PR to understand what this specific PR accomplishes. The whole stack is:

YOU ARE HERE --> Reason V4 [Stacked Diff 1/n #2605] [Allow multiple versions of Reason] …

Reason V4 [Stacked Diff 2/n #2599] [String Template Literals] …

Reason V4 [Stacked Diff 3/n #2614] [Parse Hashtags for polymorphic va… …

Reason V4 [Stacked Diff 4/n #2619] [Make merlin and rtop respect late… …

This is an approach to the Reason parser that lets us start incorporating Reason v4 changes that would otherwise be breaking, without needing breaking versions. This is important for allowing packages that depend on old versions to build alongside dependencies that depend on the new version of Reason. Without a feature like this, we risk breaking the ecosystem. But with this feature, all packages can depend on the specific version of Reason that they individually choose without worrying about if it will conflict with dependencies or dependers.

The approach in this diff is that each file will automatically record the version of Reason that it uses inside an attribute. This is not the only approach or the best/final one. It's just one that will work right now and with any build/editor tooling out there. While it records versions inside of individual files, and can allow each individual file to choose its version of Reason - it is not the goal. The goal is strictly to avoid breaking packages. Eventually we can build a configuration in some .refmt config that configures the specific version for all files in the project. For now, here's how this works. You have a file in a package that uses Reason v 3.7.0.

/**
 * MyModule.re.
 */;
type myType('a) = ('a, 'a);
...

Then you upgrade your Reason package from 3.7.0 to 3.8.0. Nothing breaks - it's a minor version and minor versions can only introduce new features but cannot break older features. But inside your editor, next time you reformat, it knows that you were programming in Reason 3.7.0 previously and it records that fact into a version attribute. It was automatic. You didn't need to do anything.

/**
 * MyModule.re.
 */;
[@reason.version 3.7];
type myType('a) = ('a, 'a);
...

That attribute tells readers and tooling what version of Reason to use when parsing/printing.

But you upgraded to 3.8, so how do you use the new features? Change the version attribute to [@reason.version 3.8] and now you can use the new syntax features released in 3.8 such as angle bracket types.

/**
 * MyModule.re.
 */;
[@reason.version 3.8];
type myType<'a> = ('a, 'a);

If you want to upgrade your whole project, then the same upgrade scripts we always use will work.

This version attribute is not the only way we can infer the version - we can allow project level configuration for example, but this is the approach that is guaranteed to work with any build system or tooling.

Future Feature: Auto-upgrade in editor

Supposed that a hypothetical version 3.9 is released where type variables may be upper cased identifiers instead of 'a. You bump your package.json version of reason to 3.9.0, and then inside of your file change the attribute to [@reason.upgradeFrom 3.8]. Then next time you reformat in your editor it will automatically upgrade from 3.8 to the current version of Reason in your project.

/**
 * MyModule.re.
 */;
[@reason.upgradeFrom 3.8];
type myType<'a> = ('a, 'a);

Reformatted in editor to:

/**
 * MyModule.re.
 */;
[@reason.version 3.9];
type myType<A> = (A, A);

EduardoRFS commented 4 years ago

That could also be integrated in dune so that you don't need to write the header later, maybe refmt can accept an additional argument and or env variable?

jfrolich commented 4 years ago

I think this is quite nifty, but how many times do we expect the syntax to have a breaking change. My concerns are:

The compiler would need to be more complex incorporating all these previous syntaxes (maintenance burden)
There's already quite a few bugs in the compiler (printing a parsetree can result in invalid syntax), making it more complex won't make it more solid

Wouldn't it be better to come up with a good syntax proposal that might take a bit more time, and then have a proper migration script?

Then it's also possible to simplify the compiler perhaps incorporating some of the code of the Bucklescript syntax that helps to have better errors and faster parsing/printing.

This partly of comes from my experience with graphql-ppx where keeping the compiler compatible adds a lot of complexity and maintenance burden.

Would it be a possibility to ship multiple versions of the compiler, and not facilitate everything in the same code base (or is that actually the idea)

jordwalke commented 4 years ago

That could also be integrated in dune so that you don't need to write the header later, maybe refmt can accept an additional argument and or env variable?

Yes, an env var is the easiest way to plumb it through all the tooling, and dune should already support environment variables for build. An build env var could also be set in an esy package.json.

jordwalke commented 4 years ago

There's already quite a few bugs in the compiler (printing a parsetree can result in invalid syntax), making it more complex won't make it more solid

@jfrolich Do you have an example in mind? I've been trying to fix all the ones I've come across so if you have a common one in mind I can take a look at it.

jordwalke commented 4 years ago

The compiler would need to be more complex incorporating all these previous syntaxes (maintenance burden)

That's definitely a valid concern, but either way, we will still want to be able to support two different syntaxes so that we can continually upgrade people without breaking the ecosystem. In the worst case if it gets too burdensome, we could literally just freeze the old parser/printer implementation, fork new ones, and switch depending on a prepassed lex. Most of the changes for Reason v4 are not hugely different, and most of the changes are actually backwards compatible - such as type "generics" syntax t<x, y>.

One thing that we can do to make the implementation of the parser much cleaner despite supporting two separate syntaxes, is to create different tokens depending on what is lexed early on. For example, after we see a version attribute (or a dune env var etc), we lex the token # as POUND ,or as POUND_v4 depending on the version. Then the parsing rules are clearly separated and no artificially reported conflicts occur in the parser. Printing is another matter.

Lupus commented 4 years ago

This actually resembles what dune does. I can specify any dune language version which is <= the version of dune that I have, and to make sure my project builds I can depend on dune >= 1.11 and specify dune language version of 1.11, and everything will work on any subsequent version of dune that I have installed. It's likely more expensive to maintain, but I see no other way for a tool like reason to evolve rapidly without breaking everything. dune evolves rapidly yet maintains very good compatibility.

Lupus commented 4 years ago

Docker is doing the same thing btw, see "Overriding default frontends" in https://docs.docker.com/develop/develop-images/build_enhancements/, special comment at the start of dockerfile can change the syntax flavour being used:

The new syntax features in Dockerfile are available if you override the default frontend. To override the default frontend, set the first line of the Dockerfile as a comment with a specific frontend image:

# syntax = <frontend image>, e.g. # syntax = docker/dockerfile:1.0-experimental

jfrolich commented 4 years ago

@jfrolich Do you have an example in mind? I've been trying to fix all the ones I've come across so if you have a common one in mind I can take a look at it.

With writing the ppx I often take snapshots of AST and use Refmt to make readable snapshots. I often need to edit them to make them work (like adding some braces in some places). I'll try to post some bug reports.

jordwalke commented 4 years ago

@jfrolich Yes, please do report those as issues especially if those same ASTs that format into invalid syntax could be constructed manually in a .re file. (Also: The ocaml syntax printing is often broken, but that's not something related to Reason - Reason just uses the stock ocaml printer. You may not have been referring to this, but thought I'd mention it.).

jfrolich commented 4 years ago

(Also: The ocaml syntax printing is often broken, but that's not something related to Reason - Reason just uses the stock ocaml printer. You may not have been referring to this, but thought I'd mention it.).

Yeah I am not entirely sure if the AST is possible to construct using plain re. It shouldn't be related to OCaml printer as I just use refmt to parse to binary and then to a AST transform and then print it back. Any idea how to best report this, not sure if a binary AST is very informative.

jfrolich commented 4 years ago

I see the tests are now huge bash scripts. Is there any appetite to convert them to rely with snapshot testing? I wrote pretty fast concurrent tests using simple Unix processes and pipes for graphql_ppx and think I can repurpose a lot of that if the PR is welcome.

jordwalke commented 4 years ago

@jfrolich Yes! @davesnx (on discord) is looking into reviving the existing PR that used rely for snapshot testing. It can't come soon enough! These bash scripts are driving me nuts.

jfrolich commented 4 years ago

@jfrolich Yes! @davesnx (on discord) is looking into reviving the existing PR that used rely for snapshot testing. It can't come soon enough! These bash scripts are driving me nuts.

Great! I just saw this PR.

jordwalke commented 4 years ago

Couple of other thoughts I had:

I've found that sometimes people don't want to upgrade a huge codebase at once because it could cause conflicts with what other people are working on (in a large org) so per file opt in is preferred in those cases. (Even if/when we control it via an env var, some people will want the per file control, just as a transition).
With a more concise syntax for attributes, it would look a little nicer:
```
@reason.version.3.8;
```
We can also optionally encode the version in the first floating doc block comment (if present).
```
/**
* My regular doc comment
* @reason.version.3.8
*/;
```

To encourage people to use floating doc comments more we could parse the triple star comments as floating doc comments.

/***
 * This is a floating doc comment
 * @reason.version.3.8
 */

jfrolich commented 4 years ago

I like the floating doc comment because it entices people to write proper module documentation. I also like the triple star for floating comments! The only thing is that it might be easier to parse (for humans and compilers) if the reason version is always on the first line of the comment. (maybe it can even be at the top line after the three dots)

reasonml / reason