rust-lang / reference

The Rust Reference
https://doc.rust-lang.org/nightly/reference/
Apache License 2.0
1.25k stars 493 forks source link

Add Spec Glossary #1537

Closed chorman0773 closed 4 months ago

chorman0773 commented 4 months ago

This adds some of the terms I used in #1523 to a separate top level section of the glossary.

Not too picky about changes to the formatting, I just wanted to throw something together to get T-spec eyes on it.

@rustbot label +T-spec +A-glossary

chorman0773 commented 4 months ago

Based on some discussion in RPLCS, I'm likely going to change the term "ill-formed" here and in #1523 to be "fails to compile", but leave "shall" untouched. Not sure what do do about "Ill-formed - No Diagnostic Required" yet, but I'll make that change as well.

traviscross commented 4 months ago

In our lang-docs call, we discussed this PR. The PR does three things:

Taking these in order:

"ill-formed program": Our feeling is that we probably don't need to define an "ill-formed program" in this way, and that if we were to, we'd probably call it something else, such as the program being "invalid" or "not accepted".

"no diagnostic required": Our feeling is that we don't need to import this C++ definition. We've generally called this class of thing "undefined behavior" or maybe "compile-time undefined behavior" in Rust.

"shall": Our feeling is that there are one of two ways to go here. Either we import or normatively reference all the IETF RFC 2119 words, and start writing "SHALL" / "MAY" / "MUST" / etc., or that we write more normally, e.g. "the user must ... to avoid undefined behavior", "the compiler will ...", in which case it may be overkill to define the words at all.

Given that, our inclination is to close the PR as a matter of review, but as some of these things were discussed on a recent spec call, we wanted to give others the opportunity to weigh in here first.

cc @rust-lang/spec

chorman0773 commented 4 months ago

We've generally called this class of thing "undefined behavior" or maybe "compile-time undefined behavior" in Rust.

As a note, the reason I wanted to add the term is because "undefined behavior" is well understood to mean an execution result produced at runtime - that is, it is only has any effects on the execution if it would be eventually reached by that execution*. In particular, it is well-understood that for any construct has_ub!() with undefined behaviour, the program if false{ has_ub!()} println!("Foo"); is well defined to print "Foo". To avoid confusion (that may lead to writing incorrect programs), I feel it's necessary to have a distinction between the terms so that it's clear when a construct is invalid to have in unreachable (or even statically unevaluated) code. "Compile-time undefined behaviour" also runs the risk of being confused with const-eval undefined behaviour, which I believe is likewise not permitted to produce arbitrary runtime results if the appropriate constant is not ever evaluated (though it is allowed to produce a compile-time diagnostic and fail translation).

*Or any other execution, for the same input, that results from any valid sequence of results from calls to specr pick from the appropriate Minirust.

traviscross commented 4 months ago

Thanks @chorman0773 for that background. Agreed those are interesting and reasonable questions. And in particular, I agree about the potential of ambiguity with "compile-time UB" with respect to const-eval.

However, it's not clear to me how the NDR term proposed here addresses the case that you mention:

In particular, it is well-understood that for any construct has_ub!() with undefined behaviour, the program if false{ has_ub!()} println!("Foo"); is well defined to print "Foo".

To avoid confusion..., I feel it's necessary to have a distinction between the terms so that it's clear when a construct is invalid to have in unreachable (or even statically unevaluated) code.

As defined in this PR, if a program is invalid in the NDR sense, then the behavior of running that program is undefined. The case you mention seems to call for a term that instead denotes that the program source violates some validity rule or property even if the resulting program has defined behavior.

Perhaps we should instead just say directly what rule or property is being violated. E.g., we might say that the expression below is not well typed:

fn main() {
    if false {
        union U { x: f32, y: f64 }
        _ = unsafe { U { x: 42 }.y };
    }
}
chorman0773 commented 4 months ago

As defined in this PR, if a program is invalid in the NDR sense, then the behavior of running that program is undefined

We don't call it "undefined behavior" though. It's called out in a deliberately different way.

well-typed/ill-typed is also a misnomer, because the rules that would cause a program to be ill-formed (much less the ones that cause it to be Ill-formed; no diagnostic required) aren't just type checking rules, or even semantic validity rules. Would you call

static struct: i32 = 0;

Ill-typed?

The point of defining the terms is that we don't have to elaborate the definition every place its used. Same goes with shall (and I don't agree we need full RFC 2119 - most language specifications I've read don't use 2119. What even would be a "SHOULD" or "MAY" in Rust?).

traviscross commented 4 months ago

The point of defining the terms is that we don't have to elaborate the definition every place its used.

The discussion here -- that we're struggling to converge on the purpose and meaning of these terms -- suggests to me that what's in the PR wouldn't yet fill that role.

traviscross commented 4 months ago

As discussed above, and after discussing this on the spec call today, we're going to close this. If other glossary items come up, we can open separate PRs or issues for those, and that would help in terms of keeping discussion and review focused.

In general, we'd prefer to see the need for a new bit of jargon, in terms of seeing that we've repeated ourselves too often, before we go out of our way to introduce that jargon.