ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.64k stars 409 forks source link

Support for menhir's `.messages` files #3284

Open Gbury opened 4 years ago

Gbury commented 4 years ago

Desired Behavior

Have builtin rules for .messages files produced by menhir (cf menhir's manual ). This would include:

Note that given that .messages files contain hand-written messages, they should be part of the source tree, hence the first two rules above (generating fresh .messages files and updating them) would need to promote the generated file (potentially overwriting the current one in the case of an update). However, the .ml file generated from the .messages file needn't be promoted.

Apologies if it already exists, but I couldn't find any reference to it neither in the docs (both for stable and latest), nor in the code (after a search for messages, nothing seemed related to menhir).

Example

I already have implemented manual rules for this in one of my projects, see https://github.com/Gbury/dolmen/blob/master/src/languages/smtlib2/v2.6/dune . These rules work however they are somewhat limited by the almost circular dependency of a .messages file on itself because of the updating process, so the example linked requires some user manual intervention to move a new or updated generated .messages file.

nojb commented 4 years ago

No, this is not yet supported out-of-the-box. This would be a nice addition and not very hard I think, a nice task for someone wanting to get into dune's codebase.

For reference, here are the relevant rules for the .messages file of menhir itself:

https://gitlab.inria.fr/fpottier/menhir/-/blob/master/src/stage2/dune#L86-130

cc @fpottier

rgrinberg commented 4 years ago

I agree with Nicolas. @Gbury if you’re interested in contributing, we could guide you through it

Gbury commented 4 years ago

I'll try and do that this weekend then (one upside of the current situation: I'll have way more free time than usual, ^^). The part that might be more complex is to have rules that differ depending on whether a .messages file already exists in the source_tree.

ghost commented 4 years ago

What is the expected workflow for such files? IIUC, we want the system to generate the file and keep it up to date, and at the same time we want the user to modify it. How is that supposed to work?

Gbury commented 4 years ago

Well that's kind of the problem, actually. What we want I think, is the following:

ghost commented 4 years ago

But if we use the newly generated updated version for the build, does that mean that the version carefully customised by the user will be ignored?

Gbury commented 4 years ago

But if we use the newly generated updated version for the build, does that mean that the version carefully customised by the user will be ignored?

No, as the menhir manual specifies:

The command --update-errors filename is used to update the auto-generated comments in the .messages file filename. It is typically used after a change in the grammar (or in the command line options that affect the construction of the automaton). A new .messages file is produced on the standard output channel. It is identical to filename, except the auto-generated comments, identified by ##, have been removed and re-generated.

However, after re-reading the doc, it's not clear whether the --update-errors flag will 1) add the potentially new errors due to the grammar change, and 2) remove the now unused errors for the new grammar, I guess we'd need to ask @fpottier .

Gbury commented 4 years ago

Ok, so I indeed misunderstood what the --update-errors option does. As explained in this issue, the update-error mechanism only updates the comments so it can be used to update the .messages file in the source tree safely.

However, in order to fully upgrade a .messages file after changing a grammar, the correct thing to do seems to be:

From that I think we could have the following in dune. The menhir stanza can have an optional field specifying an ocaml module name for the .messages file. Let's say that name is Foobar. We would then have the following builtin rules in dune:

ghost commented 4 years ago

I feel like the diff/promotion workflow of dune is very well suited for such a case.

Does the format of the messages file makes it easy to distinguish between the parts that are under the control of the user and the parts that are under the control of menhir?

Gbury commented 4 years ago

Does the format of the messages file makes it easy to distinguish between the parts that are under the control of the user and the parts that are under the control of menhir?

I'd say yes. You can see an example of freshly generated .messages file here . The <symbol> : TOKEN TOKEN TOKEN .... line is generated by menhir to identify the parser state, the ##-commented lines are here to explain more about the error state, and finally the user can insert his/her message where indicated.

ghost commented 4 years ago

Ok, then in this case I propose to setup things as follow: ask menhir generates a messages.corrected that updates the existing messages file and use (diff ...) to diff the two. When the file needs updating, dune will print a diff and the user will be free to promote the correction using dune promote or the dune-promote emacs command.

Gbury commented 1 year ago

Update: I haven't had the time to work on dune for this, but I managed to find an adequate combination of stanzas to achieve a reasonable workflow. The dune file can be found at https://github.com/Gbury/dolmen/blob/52f6e3befb76492bac5e34c915f9dc0ec026a2e2/src/languages/dune.common and a short explanation of how the rules work is in this comment : https://discuss.ocaml.org/t/dune-wish-list-for-2023/11083/60?u=zozozo

One thing that could be done in dune to help with all this is to add a messages field to the menhir stanzas, which would generate stanzas similar to those in my dune files, so that new users do not have to find the correct magic menhir invocations. One tricky thing to take care of is to correctly handle additional flags that users have given to menhir (e.g. in my example, I have functorised parsers, which force me to have tokens defined in a separate file, and thus menhir must be provided with adequate --external-tokens cli options, and the stanzas generating the messages files must also depend on the tokens.mly file).