rust-lang / compiler-team

A home for compiler team planning documents, meeting minutes, and other such things.
https://rust-lang.github.io/compiler-team/
Apache License 2.0
383 stars 67 forks source link

Contracts: Experimental attributes and language intrinsics #759

Closed pnkfelix closed 1 week ago

pnkfelix commented 3 weeks ago

Proposal

Contracts: Experimental attributes and language intrinsics

Executive summary: I want to land experimental unstable attributes for behaviorial specification, and intrinsics for enabling better tools to validate and verify those specifications.

Objective

I want to enable Rust, as a language, to be to state behaviorial specifications in formal code-like language.

Example (adapted from real code within libcore):

#![feature(rustc_contracts)]

#[derive(Clone, Debug, PartialEq, Eq)]
#[rustc_contracts::invariant(for safety: self.start <= self.end)]
pub(crate) struct IndexRange {
    start: usize,
    end: usize,
}

impl IndexRange {
    #[rustc_contracts::requires(for safety: start <= end)]
    pub const unsafe fn new_unchecked(start: usize, end: usize) -> Self {
        IndexRange { start, end }
    }

    pub const fn len(&self) -> usize {
        // (verification assumes invariant to prove underflow impossible here.)
        unsafe { unchecked_sub(self.end, self.start) }
    }

    #[rustc_contracts::requires(for safety: self.len() > 0)]
    #[rustc_contracts::ensures(
        for safety: |output| output == old(self.start),
        for correctness: |_output| self.len() == old(self.len()) - 1)]
    unsafe fn next_unchecked(&mut self) -> usize {
        let value = self.start;
        // verification tools are expected to perform reasoning like so:
        // operational safety: self.len() > 0 ==> self.start < self.end
        //     ==> value < self.end ==> value + 1 cannot overflow. QED.
        // invariant maintained: value < self.end ==> value + 1 <= self.end. QED.
        self.start = unsafe { unchecked_add(value, 1) }
        value
    }
}

(Note: the above example may not be the ideal contract to use for IndexRange; it might be the case that we would want stronger safety or correctness post-conditions. It is just meant to illustrate the basic idea.)

With such contract forms in place:

I am calling these formal specifcations "contracts". I envision contracts as a Rust-like predicate language for expressing method pre-conditions, method post-conditions, and type invariants.

I want the contract language to be sufficently precise such that:

  1. verification tools can use them for constructing formal proofs of properties related to safety, panic-freedom, and functional correctness, and
  2. validation tools can use them for generating test inputs and evaluating the correctness of outputs.

Status Quo

Here's how I see the status quo today, for the Rust project itself and for the majority of Rust developers:

then, at best, each such condition/invariant is specified via documentation and/or comments, and (potentially) dynamically-checked via assert or debug_assert.

But we can do better than that status quo.

I want the Rust standard library eventually to specify its APIs formally, and I want verification and validation tools to be able to extract those formal specifications.

What I want to add

Making pre-conditions, post-conditions, and invariants formally stated as logical predicates (that are encoded distinctly from documentation and assertions) will enable other tools to extract such logical predicates in order to generate tests and attempt static verification.

Here's what I want to add as unstable experimental language extensions:

For example, I want to be able to toggle a switch in miri that will make it attempt to test claims made by contracts.

My example above has leveraged an attribute-based surface syntax. It is possible that we will find that other non-attribute construct are more appropriate for some situations. (For example, I have seen suggestions that type invariants would be better off framed as an instance of a trait that a type implements.)

Distinct forms

I am not yet certain how many kinds of contract special forms we might need.

At the very least, I anticipate it being useful to distinguish between safety-related contracts (which one would attach to instances ofunsafe fn) vs correctness-related contracts (which one might attach to any kind of method).

For unsafe methods, one can write safety preconditions, illustrated above with #[rustc_contracts::requires(for safety: ...)]. Such preconditions should capture the same conditions that the project documents today via comments like // SAFETY.

The intention of #[rustc_contracts::requires(for safety: ...)] is to state all preconditions for the unsafe fn that, if violated, may yield Undefined Behavior that may result either during its own execution or after it returns(either in the current code base, or in future versions of it).

Both safe and unsafe methods can also have correctness criteria, distinct from safety criteria. These are stronger guarantees that may have stricter preconditions.

Contracts may also express post-conditions (i.e. assurances provided by the method specification), annotated with #[rustc_contracts::ensures(...)]. Like preconditions, post-conditions can be categorized according to whether they are focused on safety criteria, or full correctness.

Safety postconditions are predicates on the return value and overall system state that hold as the method exits, assuming the safety preconditions (and overall language and library runtime invariants) have not been violated; these do not assume any correctness preconditions.

Correctness postconditions are similar predicates, but they can assume all correctness preconditions.

Finally, invariants are predicates that can be attached to types. In particular, safety predicates are expected to established when control flows from an unsafe to safe context, and can only be assumed in safe contexts. Unsafe methods on a type must explicitly state which parts of the types' safety predicate can be assumed as part of its safety pre-conditions. (We leave the generalization of this, "correctness invariants", as an unresolved question.)

This distinction allows specifications and verification tools to focus, when needed, on safety criteria separately from correctness criteria. (It also provides more freedom in what semantics tooling might assign to contracts: in particular, I can easily imagine custom tooling that would abort a program upon encountering a safety violation, but merely log a correctness violation and allow computation to proceed. In a similar vein, I can also imagine custom tooling that would delay evaluating a correctness precondition until it saw evidence of a correctness post-condition failing.)

Note: I do not consider the design described here sufficient for meeting all of our specification needs. For example, there are numerous unsafe methods that impose obligations on the caller on how the return value from the method is used, or other "frame conditions", that cannot be locally checked solely during the execution of the unsafe method. We leave finding a satisfactory resolution to this as an open question (see "safety post-obligations" in the "Unresolved questions" below).

My overall intention with these annotations is is two-fold:

  1. rustc, or rust dynamic validation tools, can opt into injecting checks that evaluate these pre-conditions and post-conditions, and signal violations of safety or correctness accordingly, and
  2. static verification tools can use contracts to modularize verification tasks: instead of developing a proof about the program as a whole, you prove that a method obeys its contract, and then callers to that method use that contract as an abstraction in their own correctness proofs.

A richer example

Consider binary search. Rust's binary search routine is safe, regardless of what (well-typed) input it is provided. But the overall correctness of binary search requires that the input sequence is itself sorted.

#[feature(rustc_contracts)]

impl<T> [T] {
    #[rustc_contracts::requires(for correctness:
        for_all!(|i: usize, j: usize| {
            implies!( i <= j && j < self.len() ==> self[i] <= self[j] )
        }, such_as: i in 0..=2, j in i..=(i+2) ))]
    #[rustc_contracts::ensures(
        for correctness: |output| output_is_correct(output, x, self))]
    pub fn binary_search(&self, x: &T) -> Result<usize, usize> where T: Ord {
        ...
    }
}

fn output_is_correct(output: Result<usize, usize>, x: &T, s: &[T]) -> bool {
    match output {
        Ok(idx) => self.get(idx) == Some(x),
        Err(idx) => suitable_for_inserting(idx, x, s),
    }
}

fn suitable_for_inserting(idx: usize, x: &T, s: &[T]) -> bool {
    0 <= idx &&
    idx <= s.len() &&
    s.get(idx).map_or(true, |pred| pred <= x) &&
    s.get(idx+1).map_or(true, |succ| x <= succ))
}

(Again, I am not claiming these to be ideal contracts for verification purposes; they merely illustrate the ideas behind the interface.)

As shown here, I expect the contract predicate language to have some way to express logical quantifiers like "for all values i of type T". (I also expect there to be variations on this theme, such as "for all values of a given type in range start..limit", "there exists some value x of type T that satisfies predicate P", and so on.)

However, I also want to ensure that contracts are able to used even outside of static verification tooling. So, I expect contract-specific forms like for_all to be realized as macros or intrinisics that will offer ways to provide "hints" to testing machinery; that is the purpose of the such_as: ... in the above example use of for_all!: the such_as: ... is intended to be ignored by formal verification tooling, but used by other tooling as the basis for generating test inputs to seed the dynamic checking.

Unresolved questions

Here are some examples of questions that we leave open for now, to be resolved during on going experimentation with the constructs. There are many many more that are not listed here.

Syntax bikesheds galore: I am not thrilled by the prospect of writing #[rustc_contracts::requires(for safety: ...)] all over the place. For example, an earlier draft of this MCP had used #[safety::requires(...)], which I find far more palatable, but may pose serious hazards for compatibility with existing code bases. I am leaving identification of "a good syntax" as an open problem.

Static vs dynamic semantics: My idealized contract system would allow contracts to inform both static verification and dynamic validation tools. Some contracts like forall! do not have an "obviously right" dynamic interpretation.

Safety post-obligations: The safety criteria for some unsafe methods is stated as a constraint on how the caller uses the return value from a method. For example: str::as_bytes_mut says "The caller must ensure that the content of the slice is valid UTF-8 before the borrow ends and the underlying str is used." This cannot be expressed as a mere safety::requires form as envisioned above. (My best guess is that such contracts require a richer structure, such as an entiretly hypothetical #[safety::at_lifetime_end(|output| str::from_utf8(output).is_ok())], which could only be checked in the future, right before the &mut u8 borrow expires and the reference is again capable of being treated as a &mut str.)

Correctness invariants: As mentioned above, there is probably utility in being able to attach invariants to a type that are used for proving functional correctness. But it is not as clear where to establish the points where correctness invariants must be checked. It may make more sense here to use something like refinement types, where explicit method calls (potentially in ghost code) would (re)establish such invariants.

Purity: Do contracts need to be pure (i.e. have no non-local side-effects)? Some verification tools will require this.

Panic: How should a panicking expression within a contract be treated? Some of the above code was written with slice's get method rather than the index operator, in order to sidestep this question. But rather than disallowing panic entirely (i.e. treating it as a improperly written contract), it might be more natural to interpret a panic within a contract expression as a contract failure.

Related work

There are several projects that seek to bring state-of-the-art verification and automated reasoning technolgies to Rust. See for example Aeneas, Creusot, Kani, Prusti, and Verus.

There are also validation tools like Proptest and Quickcheck that allow developers to write invidual tests that cover large families of behavior.

This proposal is largely a result of my discussions with some of the developers of the static verification tooling (as well as my own investigations of such tooling).

You can see a presentation I did on this subject at: http://pnkfx.org/presentations/contracts2-rw2024.pdf, and the project proposal I put forth earlier this year at : https://rust-lang.github.io/rust-project-goals/2024h2/Contracts-and-invariants.html

At this point I want to stop attempting to acquire outside interest in the task, and instead start writing unstable code in Nightly Rust. Thus this MCP. :)

Mentors or Reviewers

If you have a reviewer or mentor in mind for this work, mention them here. You can put your own name here if you are planning to mentor the work.

I expect interest from various verification tooling people for this work, including @celinval and @Nadrieril

Process

The main points of the Major Change Process are as follows:

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

rustbot commented 3 weeks ago

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

Concerns or objections to the proposal should be discussed on Zulip and formally registered here by adding a comment with the following syntax:

 @rustbot concern reason-for-concern 
 <description of the concern> 

Concerns can be lifted with:

 @rustbot resolve reason-for-concern 

See documentation at https://forge.rust-lang.org

cc @rust-lang/compiler @rust-lang/compiler-contributors

oli-obk commented 3 weeks ago

@rustbot second

pnkfelix commented 2 weeks ago

A colleague reminded me of these past meetings with relevant content: https://github.com/rust-lang/lang-team/issues/181 , which links to a hackmd.

I think that hackmd's content was snapshotted at that time and transcribed to lang team design meeting minutes

apiraino commented 1 week ago

@rustbot label -final-comment-period +major-change-accepted