CBMC unwinds forever on s2n-quic with the MIR linker enabled

zhassan-aws commented 2 years ago

One of the harnesses in s2n-quic (round_trip) that was previously solved fairly quickly now unwinds forever.

Steps to reproduce

git clone https://github.com/aws/s2n-quic
cd s2n-quic/quic/s2n-quic-core
git checkout 80e5a554fea9f5ca5ed9befa8012170eaecfdd62
cargo kani --tests --harness round_trip --unwind 5 --enable-unstable --mir-linker

CBMC keeps unwinding for more than 2 minutes and never gets to the solving stage:

Unwinding loop _ZN4core6option15Option$LT$T$GT$3map17h910530024d5c6d0bE.0 iteration 1 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs line 932 column 5 function std::option::Option::<usize>::map::<usize, [closure@core::slice::memchr::memchr_general_case::{closure#2}]> thread 0
Unwinding loop _ZN4core6option15Option$LT$T$GT$3map17h910530024d5c6d0bE.1 iteration 1 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs line 932 column 5 function std::option::Option::<usize>::map::<usize, [closure@core::slice::memchr::memchr_general_case::{closure#2}]> thread 0
Unwinding loop _ZN4core6option15Option$LT$T$GT$3map17h910530024d5c6d0bE.0 iteration 2 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs line 932 column 5 function std::option::Option::<usize>::map::<usize, [closure@core::slice::memchr::memchr_general_case::{closure#2}]> thread 0
Unwinding loop _RINvNvXs1_NtNtCs4IvNUtTnEFD_4core5slice5indexINtNtNtBc_3ops5range5RangejEINtB8_10SliceIndexSpE13get_unchecked7runtimehECs1AVzK06CBl6_13s2n_quic_core.0 iteration 1 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs line 295 column 13 function <std::ops::Range<usize> as std::slice::SliceIndex<[T]>>::get_unchecked::runtime::<u8> thread 0
aborting path on assume(false) at file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs line 294 column 13 function <std::ops::Range<usize> as std::slice::SliceIndex<[T]>>::get_unchecked::runtime::<u8> thread 0
Unwinding loop _RINvNvXs1_NtNtCs4IvNUtTnEFD_4core5slice5indexINtNtNtBc_3ops5range5RangejEINtB8_10SliceIndexSpE13get_unchecked7runtimehECs1AVzK06CBl6_13s2n_quic_core.0 iteration 1 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs line 295 column 13 function <std::ops::Range<usize> as std::slice::SliceIndex<[T]>>::get_unchecked::runtime::<u8> thread 0
Unwinding loop memcmp.0 iteration 1 file <builtin-library-memcmp> line 25 function memcmp thread 0
Unwinding loop memcmp.0 iteration 2 file <builtin-library-memcmp> line 25 function memcmp thread 0
Not unwinding loop memcmp.0 iteration 3 file <builtin-library-memcmp> line 25 function memcmp thread 0
Unwinding loop _RNvXNtNtCs4IvNUtTnEFD_4core3str7patternNtB2_12CharSearcherNtB2_8Searcher10next_matchCs1AVzK06CBl6_13s2n_quic_core.0 iteration 2 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/str/pattern.rs line 439 column 28 function <std::str::pattern::CharSearcher as std::str::pattern::Searcher>::next_match thread 0
...

Kani Version: af6c93eb74520c9d7eeda3b7e7f28220d617b9cf

If I don't enable the MIR linker, verification completes successfully:

$ cargo clean; cargo kani --tests --harness round_trip --unwind 5

<snip>

SUMMARY:
 ** 0 of 1339 failed (1 unreachable)

VERIFICATION:- SUCCESSFUL
Verification Time: 43.949997s

zhassan-aws commented 2 years ago

With 1eb9643d8e98c97dffbfc3f73dfa02abd68974ba, verification completes successfully on this harness with the MIR linker enabled.

zhassan-aws commented 2 years ago

This behavior is introduced in 5539c0de6e771bd10c2b692cd6d62f0abba98144. In the previous commit (f3b5a9777c535f386949cfabf55db820b73837e0), the run completes in a few seconds with the MIR linker.

celinval commented 2 years ago

Is it possible that the model is just too big for cbmc to handle? Have you tried setting the unwind loop to 1?

zhassan-aws commented 2 years ago

Yes, with an unwind of 1, it finishes quickly, and an unwinding assertion fails:

$ cargo kani --tests --harness round_trip --unwind 1 --enable-unstable --mir-linker
<snip>

SUMMARY:
 ** 1 of 4507 failed (4506 undetermined)
Failed Checks: unwinding assertion loop 1
 File: "/home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/range.rs", line 815, in <(std::collections::Bound<&u8>, std::collections::Bound<&u8>) as std::ops::RangeBounds<u8>>::contains::<u8>

VERIFICATION:- FAILED
[Kani] info: Verification output shows one or more unwinding failures.
[Kani] tip: Consider increasing the unwinding value or disabling `--unwinding-assertions`.
Verification Time: 2.66511s

Any unwind value above 1 causes it to unwind forever.

zhassan-aws commented 2 years ago

Here's an independent test that reproduces the same behavior:

pub enum Error {
    Err1(usize),
    Err2,
}

use std::fmt;
impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            Self::Err1(len) => write!(f, "err1: {}", len),
            Self::Err2 => write!(f, "err2"),
        }
    }
}

fn encode(x: i32) -> Result<Vec<u8>, ()> {
    if x > 10 {
        Ok(vec![1])
    } else {
        Err(())
    }
}

fn decode_inner(v: &Vec<u8>) -> Result<(), Error> {
    let x = match v.len() {
        1 => Err(Error::Err1(3)),
        2 => Err(Error::Err2),
        _ => Ok(()),
    };
    x
}

fn decode(v: Vec<u8>) -> Result<(), String> {
    decode_inner(&v).map_err(|err| err.to_string())?;
    Ok(())
}

#[kani::proof]
#[kani::unwind(2)]
fn main() {
    let v = encode(kani::any()).unwrap();
    let _ = decode(v);
}

$ kani err_vec.rs --enable-unstable --mir-linker

Unwinding loop _ZN4core10intrinsics23is_aligned_and_not_null17h6eae2af2c254ed47E.1 iteration 1 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs line 2204 column 5 function std::intrinsics::is_aligned_and_not_null::<u8> thread 0
Unwinding loop _ZN4core10intrinsics23is_aligned_and_not_null17h6eae2af2c254ed47E.2 iteration 1 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs line 2204 column 5 function std::intrinsics::is_aligned_and_not_null::<u8> thread 0
Unwinding loop _RINvNvNtCs4IvNUtTnEFD_4core10intrinsics19copy_nonoverlapping7runtimehECs7nqv5Ecv62r_7err_vec.4 iteration 1 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs line 2329 column 13 function std::intrinsics::copy_nonoverlapping::runtime::<u8> thread 0
Not unwinding loop _RINvNvNtCs4IvNUtTnEFD_4core10intrinsics19copy_nonoverlapping7runtimehECs7nqv5Ecv62r_7err_vec.0 iteration 2 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs line 2329 column 13 function std::intrinsics::copy_nonoverlapping::runtime::<u8> thread 0
Unwinding loop _RINvNvNtCs4IvNUtTnEFD_4core10intrinsics19copy_nonoverlapping7runtimehECs7nqv5Ecv62r_7err_vec.1 iteration 1 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs line 2329 column 13 function std::intrinsics::copy_nonoverlapping::runtime::<u8> thread 0
Unwinding loop _RINvNvNtCs4IvNUtTnEFD_4core10intrinsics19copy_nonoverlapping7runtimehECs7nqv5Ecv62r_7err_vec.5 iteration 1 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs line 2329 column 13 function std::intrinsics::copy_nonoverlapping::runtime::<u8> thread 0
aborting path on assume(false) at file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/intrinsics.rs line 2328 column 9 function std::intrinsics::copy_nonoverlapping::runtime::<u8> thread 0
aborting path on assume(false) at file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs line 853 column 15 function std::result::Result::<std::ptr::NonNull<[u8]>, std::alloc::AllocError>::map_err::<std::collections::TryReserveError, [closure@alloc::raw_vec::finish_grow<std::alloc::Global>::{closure#1}]> thread 0
Unwinding loop _ZN4core6result19Result$LT$T$C$E$GT$7map_err17h0793994c3d33a0c8E.0 iteration 1 file /home/ubuntu/.rustup/toolchains/nightly-2022-09-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs line 857 column 5 function std::result::Result::<std::ptr::NonNull<[u8]>, std::alloc::AllocError>::map_err::<std::collections::TryReserveError, [closure@alloc::raw_vec::finish_grow<std::alloc::Global>::{closure#1}]> thread 0
...

zhassan-aws commented 2 years ago

Can also be reproduced without Vec:

pub enum Error {
    Err1(usize),
    Err2,
}

use std::fmt;
impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            Self::Err1(len) => write!(f, "{}", len),
            Self::Err2 => write!(f, ""),
        }
    }
}

fn encode(x: i32) -> Result<[u8; 1], ()> {
    if x > 10 {
        Ok([1])
    } else {
        Err(())
    }
}

fn decode_inner(v: &[u8]) -> Result<(), Error> {
    let x = match v.len() {
        1 => Err(Error::Err1(3)),
        2 => Err(Error::Err2),
        _ => Ok(()),
    };
    x
}

fn decode(v: [u8; 1]) -> Result<(), String> {
    decode_inner(&v).map_err(|err| err.to_string())?;
    Ok(())
}

#[kani::proof]
#[kani::unwind(2)]
fn main() {
    let v = encode(kani::any()).unwrap();
    let _ = decode(v);
}

tautschnig commented 2 years ago

Does this involve function pointers of any kind? Do you perhaps end up with an extremely broad computation tree?

zhassan-aws commented 2 years ago

Possibly, through the fmt::Display trait. Is there a way to check in the goto binary/symbol table file?

tautschnig commented 2 years ago

If you do goto-instrument --show-goto-functions on the final goto binary you should get to witness this, but also --gen-c would show a large number of function calls.

zhassan-aws commented 2 years ago

Shrinked even further and removed all non-determinism:

pub enum Error {
    Err1(usize),
}

use std::fmt;
impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            Self::Err1(len) => write!(f, "{}", len),
        }
    }
}

fn decode_inner() -> Result<(), Error> {
    Err(Error::Err1(3))
}

fn decode() -> Result<(), String> {
    decode_inner().map_err(|err| err.to_string())?;
    Ok(())
}

#[kani::proof]
#[kani::unwind(2)]
fn main() {
    let _ = decode();
}

zhassan-aws commented 2 years ago

Here's the output of goto-instrument --show-goto-functions on the shrinked test: goto-functions.txt

celinval commented 2 years ago

The Display trait is implemented everywhere in the std. I'm not surprised that blows up. Maybe --restrict-vtable can help

celinval commented 2 years ago

--restrict-vtable didn't make any difference.

celinval commented 2 years ago

BTW, I'm using the following example:

pub struct Error(usize);

use std::fmt;
impl fmt::Display for Error {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}", self.0)
    }
}

#[kani::proof]
#[kani::unwind(2)]
fn main() {
    let s = Error(0).to_string();
    assert!(s.len() > 0);
}

celinval commented 2 years ago

@tautschnig Does cbmc have any nicer way to debug loop unwinding such as a graphical representation of the call graph?

zhassan-aws commented 2 years ago

This is the output from goto-instrument --dump-c on the reduced testcase (had to use CBMC latest to include the fix for https://github.com/diffblue/cbmc/issues/7158):

test.txt

zhassan-aws commented 2 years ago

@celinval tracked this down to the formatting code (specifically the write_fmt function called by the write! macro).

@camshaft, a workaround is to replace the fmt code under kani:

diff --git a/common/s2n-codec/src/decoder/mod.rs b/common/s2n-codec/src/decoder/mod.rs
index de45b163..45f8dbe1 100644
--- a/common/s2n-codec/src/decoder/mod.rs
+++ b/common/s2n-codec/src/decoder/mod.rs
@@ -424,14 +424,18 @@ pub enum DecoderError {
 use core::fmt;
 impl fmt::Display for DecoderError {
     fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        match self {
-            Self::UnexpectedEof(len) => write!(f, "unexpected eof: {}", len),
-            Self::UnexpectedBytes(len) => write!(f, "unexpected bytes: {}", len),
-            Self::LengthCapacityExceeded => write!(
-                f,
-                "length could not be represented in platform's usize type"
-            ),
-            Self::InvariantViolation(msg) => write!(f, "{}", msg),
+        if cfg!(kani) {
+            Ok(())
+        } else {
+            match self {
+                Self::UnexpectedEof(len) => write!(f, "unexpected eof: {}", len),
+                Self::UnexpectedBytes(len) => write!(f, "unexpected bytes: {}", len),
+                Self::LengthCapacityExceeded => write!(
+                    f,
+                    "length could not be represented in platform's usize type"
+                ),
+                Self::InvariantViolation(msg) => write!(f, "{}", msg),
+            }
         }
     }
 }

This makes the round_trip proof go through quickly.

zhassan-aws commented 2 years ago

Here's another minimal reproducer:

use std::fmt::Write;

#[kani::proof]
#[kani::unwind(2)]
fn main() {
    let mut s = String::new();
    let _ = write!(&mut s, "{}", 5);
}

tautschnig commented 2 years ago

@tautschnig Does cbmc have any nicer way to debug loop unwinding such as a graphical representation of the call graph?

You can use goto-instrument --dot to obtain a diagram (but I guess on this occasion you've managed to track down the problem already?).

zhassan-aws commented 2 years ago

but I guess on this occasion you've managed to track down the problem already?

Not really. We only know it's related to the formatting code, but not sure exactly why the unwinding is not terminating.

zhassan-aws commented 2 years ago

Yet another minimal reproducer:

#[kani::proof]
#[kani::unwind(2)]
fn main() {
    let a = std::fmt::format(format_args!("{}", 5));
}

zhassan-aws commented 2 years ago

I've extracted the relevant code from std, and created a test that pretty much doesn't use any of the formatting code in std, but still reproduces the unbounded unwinding behavior:

pub trait Trait {
    fn f(&self, c: char);
}

struct Foo {}

impl Trait for Foo {
    fn f(&self, c: char) {
        let _ = c.encode_utf8(&mut [0; 4]);
    }
}

pub struct Formatter {}

fn nn(_x: &u8, _f: &Formatter) {}

pub struct ArgumentV1<'a> {
    value: &'a u8,
    formatter: fn(&u8, &Formatter) -> (),
}

impl<'a> ArgumentV1<'a> {
    pub fn new<'b>(x: &u8, f: fn(&u8, &Formatter) -> ()) -> ArgumentV1<'b> {
        unsafe {
            ArgumentV1 {
                formatter: std::mem::transmute(f),
                value: std::mem::transmute(x),
            }
        }
    }
}

pub struct Arguments<'a> {
    args: &'a [ArgumentV1<'a>],
}

impl<'a> Arguments<'a> {
    pub const fn new_v1(args: &'a [ArgumentV1<'a>]) -> Arguments<'a> {
        Arguments { args }
    }
}

pub fn write(_output: &mut dyn Trait, args: Arguments<'_>) {
    let formatter = Formatter {};

    for arg in args.args {
        let _ = (arg.formatter)(arg.value, &formatter);
    }
}

#[cfg_attr(kani, kani::proof, kani::unwind(2))]
fn main() {
    let mut f = Foo {};
    let a = [ArgumentV1::new(&5, nn)];
    let args = Arguments::new_v1(&a);
    write(&mut f, args);
}

zhassan-aws commented 2 years ago

Reduced the above test further to:

pub trait Trait {
    fn f(&self);
}

struct Foo {}

impl Trait for Foo {
    fn f(&self) {
        let _ = 'x'.encode_utf8(&mut [0; 4]);
    }
}

pub struct Formatter {}

fn nn(_x: &u8, _f: &Formatter) {}

pub struct ArgumentV1 {
    formatter: fn(&u8, &Formatter) -> (),
}

impl ArgumentV1 {
    pub fn new() -> ArgumentV1 {
        unsafe {
            ArgumentV1 {
                formatter: std::mem::transmute(nn as fn(_, _)),
            }
        }
    }
}

#[cfg_attr(kani, kani::proof, kani::unwind(2))]
fn main() {
    let f = Foo {};
    let a = [ArgumentV1::new()];

    let _output = &f as &dyn Trait;
    let formatter = Formatter {};

    for arg in &a {
        let _ = (arg.formatter)(&5, &formatter);
    }
}

CBMC stays in the unrolling phase for more than 10 minutes without terminating.

tautschnig commented 2 years ago

Function pointer removal as taking place in goto-instrument will replace the arg.formatter pointer by a number of candidate functions. The set of "candidate" functions is determined by looking at matching (with very coarse over-approximation) signatures. See also https://diffblue.github.io/cbmc/man/goto-instrument.html, search for the documentation of --remove-function-pointers. With --mir-linker, the pool of candidate functions is much larger. The problem then really is the functions called from these candidate functions. In many cases, the constant propagation during symbolic execution may alleviate this problem when the actual value of the function pointer can be precisely determined. It seems that's not the case here, AFAIK owing to the loop iteration (actually whatever iterators do under the hood). It maybe be worth diving further why we can't constant-propagate this here. So you end up with a number of functions being called when it should really just be nn (if I understand the Rust code correctly). You might have to use --restrict-vtable on this example.

zhassan-aws commented 2 years ago

@tautschnig and I managed to reduce it further to:

pub trait Trait {
    fn f(&self);
}

struct Foo {}

impl Trait for Foo {
    fn f(&self) {
        let _ = 'x'.encode_utf8(&mut [0; 4]);
    }
}

pub struct Formatter {}

fn nn(_x: &u8, _f: &Formatter) {}

pub struct ArgumentV1 {
    formatter: fn(&u8, &Formatter) -> (),
}

#[cfg_attr(kani, kani::proof, kani::unwind(2))]
fn main() {
    let f = Foo {};
    let a = [ArgumentV1 { formatter: nn, }];

    let _output = &f as &dyn Trait;
    let formatter = Formatter {};

    let mut iter = a.iter();
    let _ = (iter.next().unwrap().formatter)(&5, &formatter);
}

tautschnig commented 2 years ago

https://github.com/diffblue/cbmc/pull/7243 fixes this particular example as constant propagation can then figure out during symbolic execution that nn is the only possible value for the function pointer.

(Some more information from the analysis of the above example with @zhassan-aws: let _output = &f as &dyn Trait; adds f to the candidate list of functions being considered by CBMC's function pointer removal as f's address is taken at that point. This, in turn, will result in encode_utf8 being symbolically executed, and that function just appears to be large (or itself be calling many other functions). Also, it seemed that it was required to both take a slice and use an iterator to take constant propagation off track.)

zhassan-aws commented 2 years ago

That's awesome! Thanks for investigating and figuring out a fix @tautschnig!

zhassan-aws commented 2 years ago

One thing that is weird though is that encode_utf8 does not seem to be an issue in itself. For example, this small program:

fn foo(c: char, buf: &mut [u8]) {
    c.encode_utf8(buf);
}

#[kani::proof]
fn main() {
    let mut buf = [0; 4];
    foo('x', &mut buf);
    assert_eq!(buf, [120, 0, 0, 0]);
}

is verified in a fraction of a second:

$ kani encode.rs --enable-unstable --mir-linker
<snip>

SUMMARY:
 ** 0 of 91 failed (7 unreachable)

VERIFICATION:- SUCCESSFUL
Verification Time: 0.17814663s

Even if I hide the function behind dyn trait to mimic the other program, it still finishes quickly:

trait T {
    fn foo(&mut self, c: char, buf: &mut [u8]);
}

struct Foo {}

impl T for Foo {
    fn foo(&mut self, c: char, buf: &mut [u8]) {
        c.encode_utf8(buf);
    }
}

fn bar(x: &mut dyn T) {
    let mut buf = [0; 4];
    x.foo('x', &mut buf);
    assert_eq!(buf, [120, 0, 0, 0]);
}

#[kani::proof]
#[kani::unwind(5)]
fn main() {
    let mut f = Foo {};
    bar(&mut f);
}

$ kani encode2.rs --enable-unstable --mir-linker
<snip>

SUMMARY:
 ** 0 of 104 failed (7 unreachable)

VERIFICATION:- SUCCESSFUL
Verification Time: 0.18649799s

So I'm not sure why it's causing issues in the other program.

tautschnig commented 1 year ago

Following up on https://github.com/model-checking/kani/issues/1767#issuecomment-1279423415: it seems that working through the Formatter with perhaps a lack of constant propagation is what takes forever. Logs kindly shared by @zhassan-aws from an older version of CBMC that didn't have the necessary improvements include lines like:

Unwinding recursion <std::fmt::Arguments<'_> as std::fmt::Display>::fmt iteration 1

and almost all the instructions symbolically executed stemming from either lib/rustlib/src/rust/library/core/src/fmt/mod.rs or lib/rustlib/src/rust/library/core/src/fmt/num.rs.

model-checking / kani

CBMC unwinds forever on s2n-quic with the MIR linker enabled #1767