Policy: Use of Minirust/DSLs to specify Semantics

chorman0773 commented 2 months ago

I'd like for us to have a full discussion about the use of Minirust and other DSLs as a normative part of the specification - how much of the spec should be written in Minirust or another DSL vs. prose, and when it's allowed to be used in place of prose.

The Operational Semantics Team (or members thereof) should be involved in this discussion as applied to the Dynamic Semantics chapter.

mattheww commented 2 months ago

If MiniRust ends up being the authoritative description of the operational semantics, that leaves the question of what English text will accompany it.

At one end of the scale we could be telling users of the spec that they're expected to read through MiniRust's code, and there's a little natural language commentary to guide that and supply a few non-normative notes.

At the other end of the scale there could be a complete English restatement of what MiniRust's code does, with any divergence between them treated as a bug in the spec.

joshlf commented 2 months ago

My 2 cents: In my experience, the English prose used in the standard library documentation and Reference has historically suffered from a lack of agreement about what terms actually mean. [1] Worse, it's often not clear whether an author has intended to use an English word in its colloquial or technical meaning. While it's in principle possible to create a complete glossary, there's no mechanism other than human review to ensure that words are always used consistent with that glossary. By contrast, code specifications are clear with respect to intent. There's no ambiguity about whether a particular term is meant in its colloquial or technical meaning since there is no colloquial meaning.

For that reason, I'd personally prefer that whatever English prose accompanies the MiniRust be non-normative. That doesn't have any bearing on how much prose there is, just that the MiniRust is always the source of truth.

[1] For example, see this thread, in which multiple people intimately familiar with discussions regarding Rust's semantics realize that they don't agree on how to use the terms "live", "active", and "invalidated".

chorman0773 commented 2 months ago

We'd need normative prose to define Minirust in the first place, and anything that we can't use Minirust to define.

joshlf commented 2 months ago

I take it MiniRust is not yet feature-complete in the sense that the entire spec could not be written in terms of MiniRust? Is the intention for MiniRust to eventually get to that point in the future?

chorman0773 commented 2 months ago

The entire spec can't even be written in Minirust - Minirust is a tool for expressing the operations on the abstract machine performed by a fully parameterized Rust program. It doesn't handle, for example, the whether or not an input rust program is well-formed. We also need prose to express the relationship between a given Rust program, and the corresponding Minirust program.

RalfJung commented 2 months ago

I think there's an appeal to the approach of having English text and a formal DSL (a-mir-formality / MiniRust) side-by-side. This takes some pressure off of the English text, meaning we can lean a bit more on the side of readability, while also reducing ambiguity. Divergence between the two should be treated as a spec bug.

There's also various ways to incorporate light-weight DSLs for other parts of the spec, e.g. I think that https://github.com/rust-lang/spec/pull/15 could benefit quite a bit from defining terms like "representation" and "layout" in a more formal way. Even just specifying, e.g., that "representation relation" is a function of type fn(representation: List<AbstractByte>, value: Value) -> bool, e.g. a relation between (byte-level) representations and (abstract) values, already helps a lot. In particular, it forces one to define what AbstractByte and Value are. Even if the concrete representation relations are then described purely in prose, having these key concepts defined more formally is a huge step up from e.g. C and C++ where such basic terms are still the subject of disagreement and debate.

mattheww commented 2 months ago

Is MiniRust ready for serious work on that sort of English text, or does it need to cook a bit more first?

RalfJung commented 2 months ago

Unsized types are still missing, I hope to have a student working on that later this year. They might require some far-reaching changes. Other than that I think the core structure of MiniRust is pretty solid now.

I don't have any idea how to best manage the side-by-side with the English text though. MiniRust is written in "literate programming" style to support mixing it with text, but I am not sure that's the best approach -- in particular, the Ferrocene spec is extremely good at making everything linkable, and I don't know how to do the same with a mix of code and text like we may have here.

mattheww commented 1 month ago

Another part of the question is what to do about documenting the pseudo-Rust that MiniRust is written in.

I think that's not as lucid as it once was (for example, there's now a call to an extract() method in the encode definition for Enums that doesn't seem to be defined or described anywhere).

chorman0773 commented 1 month ago

So, FTR, I'd expect Minirust itself would be defined in prose - we'd then use the Minirust Language to explain the semantics of a Rust program.

RalfJung commented 1 month ago

I think that's not as lucid as it once was (for example, there's now a call to an extract() method in the encode definition for Enums that doesn't seem to be defined or described anywhere).

Good catch! That's a specr-transpile bug. I changed the code a bit to avoid the need for extract.

@chorman0773 I sure hope that Rust will have an authoritative machine-readable spec for its core language, to complement the prose. Prose is just too ambiguous as the sole source of truth, as experience confirms time and time again.

digama0 commented 1 month ago

@chorman0773 I sure hope that Rust will have an authoritative machine-readable spec for its core language, to complement the prose. Prose is just too ambiguous as the sole source of truth, as experience confirms time and time again.

While I agree to an extent, there is a classic bootstrapping issue here: if you use a formal spec language to define the language, then you need another spec to define the formal spec language... You have to either bottom out at prose or hope that the root is "obvious enough as to not need explanation", and I'm not sure we can really get the latter. (You have the same issue with prose specs; usually this is addressed with some kind of initial "term definition" prologue but ultimately still boils down to natural language that the user is expected to know already.)

The fact that specr is so similar to rust is a downside here, as it means that specifying it may not be significantly simpler than specifying rust, so you won't have made progress toward the root of the bootstrap chain.

chorman0773 commented 1 month ago

And at some point, specifying a completely different language which is then used to specify Rust becomes more writing effort than just specifying Rust directly.

RalfJung commented 1 month ago

specr is a small subset of Rust, in particular a subset without any of the nasty questions around evaluation order, drop, or UB. So I think it is fair to say that be expressing our intend in prose and in specr, we have made significant progress towards the bootstrap root.

I'm not saying that should replace the English text that explains what happens, but I think complementing the English prose with something less ambiguous would be extremely valuable.

So, specr doesn't need to be "obvious enough as to not need explanation", it just needs to be "obvious enough as to not need explanation when there's also English text next to the code that says what the code does".

rust-lang / spec

Policy: Use of Minirust/DSLs to specify Semantics #55