rust-lang / reference

The Rust Reference
https://doc.rust-lang.org/nightly/reference/
Apache License 2.0
1.24k stars 482 forks source link

Grammar accuracy #443

Open ehuss opened 6 years ago

ehuss commented 6 years ago

Currently the grammar in the reference is only a rough reflection of the Rust syntax. Ignoring that it is in an ad-hoc language, it tends to stray for at least a few reasons:

My question is: Is that OK?

I see the primary audience of the reference as a typical Rust programmer who wants to learn a little more about the language, or to have a better understanding of how to write Rust code. This means that the grammar does not necessarily cater to someone writing a Rust parser or tooling.

Here's an example of something that I was looking at for adding macros to the grammar. The syntax for macro invocations depends on where it is used. The path to the macro can be a Type Path in types, Expression Path in expressions, a Simple Path for associated items. However, macro paths cannot (semantically) have generic parameters, so in practice they are all always Simple Paths. It would be simpler to document and explain that they are always simple paths, instead of documenting all the different kinds with the caveats about their limitations. Would you agree?

One place where this inaccuracy is a problem is describing the grammar accepted by macros. Currently it would be wrong to say the stmt matcher matches what I've defined as a Statement due to semicolons. There are many other examples where the macro matchers would be wrong. Maybe that is OK, and we can defer an accurate description until the wg-grammar is ready. Or just leave the macro matchers intentionally vague (as I think most programmers will understand the intent).

Would love to hear what people think!

Havvy commented 6 years ago

Specifying the actual syntax accurately can be messy and complex, obscuring the intent.

This is the real challenging thing of documenting for the goals of the reference. We want it to be clear in intent while still describing those messy and complex interactions. And the kludges to the grammar are mostly in trying to do that.

Sometimes the grammar includes semantic restrictions that are not technically part of the syntax. This is usually done for clarity, or to avoid extensive prose to explain the semantic restrictions.

Sometimes the line between what is grammar and what is semantics is a matter of perspective. Where we draw our lines is ultimately what makes a difference between the compiler grammar, reference grammar, and WG-Grammar grammar. When we think it's more understandable to have it on the grammar side, we should go with it. Unlike the other two grammars, we really don't care about diagnostics and programs that fail to compile with invalid semantics.

Sometimes it distinguishes things that are conceptually distinct, but not syntactically.

We should consider that a bug in our documentation and file issues on each case. The only case I know of though, it the difference between an enum variant expression and a struct expression. Though, in that case, I think it'd probably be best to just merge those two pages together. The struct expression includes unions as well after all.

Here's an example of something that I was looking at for adding macros to the grammar. The syntax for macro invocations depends on where it is used. The path to the macro can be a Type Path in types, Expression Path in expressions, a Simple Path for associated items. However, macro paths cannot (semantically) have generic parameters, so in practice they are all always Simple Paths. It would be simpler to document and explain that they are always simple paths, instead of documenting all the different kinds with the caveats about their limitations. Would you agree?

I agree. Though we can also add the salient parts of this paragraph to a note on the macro invocation page.

Or just leave the macro matchers intentionally vague (as I think most programmers will understand the intent).

We do want to document all the caveats somewhere in the reference. Stating so on the macro matchers page would probably be best.