Memory crash during Prusti encoding step

Pointerbender commented 2 years ago

Somewhere between commits bb3cb3fe25 and f8e70f9 a regression was introduced that significantly increased memory usage, leading to a crash during the Prusti encoding step in both debug and release mode. On my computer with 16GB of physical memory and 32GB swap file (Ubuntu), the prusti-driver process ends up consuming ~100% CPU and 10-11GB of physical memory and another ~25GB of virtual memory, utilizing all available memory (and bringing the whole system to a near-freeze because of excessive swapping in the rest of the OS) before exiting the process during the encoding of a predicate.

I did not have a chance yet to deep dive into this during working weekdays, but I'm posting my preliminary findings here already in the hopes that someone might be able to spot where the problem could lie. I'll have some more time on the weekend to look deeper :)

The problem appears in a relatively large (WIP and unpublished) project, which makes debugging harder than usual. My plan of approach to narrow it down is:

Post initial findings (this post).
Try to build a minimal reproduction (I think this might be hard, commenting out random parts of the predicate seem to bring down the resource consumption to acceptable levels and it's not possible to pin-point a certain clause in the predicate that might be responsible).
Try to narrow down the exact commit that introduced the regression, in the hopes that a smaller diff may yield new insights.
If all else fails, publish a draft version of the project for closer inspection (although I hope I can prevent burdening anyone with having to debug the whole thing :) )

Currently I'm on commit f8e70f9037 and the cargo-prusti command I use to verify the project is:

# Release mode
RUST_BACKTRACE=1 PRUSTI_LOG=info PRUSTI_DUMP_VIPER_PROGRAM=true time prusti-dev/target/release/cargo-prusti
# Debug mode
RUST_BACKTRACE=1 PRUSTI_LOG=debug PRUSTI_DUMP_VIPER_PROGRAM=true time prusti-dev/target/debug/cargo-prusti

Its Prusti.toml contents are:

check_overflows = true
check_panics = true
skip_unsupported_features = true
assert_timeout = 60000
optimizations = "inline_constant_functions,delete_unused_predicates,optimize_folding,remove_empty_if,purify_vars,remove_unused_vars,remove_trivial_assertions,clean_cfg" # all minus fix_quantifiers
extra_verifier_args = ["--z3ConfigArgs=\"smt.random_seed=42\""]

In the moment right before crashing, the debug logs show the following:

[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::mir::pure::pure_functions::interpreter] Encode aggregate Closure(DefId(0:437 ~ prusti_cursor[64ea]::database::cursor::prusti_pred_item_cursor_invariant_507743adf4484769a015f1ac07b9b719::{closure#0}), [i8, extern "rust-call" fn((usize,)) -> bool, (&&database::cursor::Cursor<'_>,)]), [move _103]
[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&&database::cursor::Cursor<'_>'
[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&database::cursor::Cursor<'_>'
[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type 'database::cursor::Cursor<'_>'
[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::name_interner] Interning of "m_prusti_cursor$$database$$cursor$$Cursor" is "m_Cursor"
[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&&database::cursor::Cursor<'_>'
[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&database::cursor::Cursor<'_>'
[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type 'database::cursor::Cursor<'_>'
[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::name_interner] Interning of "m_prusti_cursor$$database$$cursor$$Cursor" is "m_Cursor"
[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::foldunfold::requirements] arg snap$<Snapshot(struct$m_Cursor)>(self.closure_0.val_ref.val_ref) is not a place with type ref
[2022-02-08T17:23:40Z DEBUG prusti_viper::encoder::foldunfold::requirements] arg snap$<Snapshot(struct$m_Cursor)>(self.closure_0.val_ref.val_ref) is not a place with type ref
[2022-02-08T17:23:41Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Value(Scalar(0x01)) --> Const(ConstExpr { value: Bool(true), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:23:41Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&database::cursor::Cursor<'_>'
[2022-02-08T17:23:41Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type 'database::cursor::Cursor<'_>'
[2022-02-08T17:23:41Z DEBUG prusti_viper::encoder::name_interner] Interning of "m_prusti_cursor$$database$$cursor$$Cursor" is "m_Cursor"
[2022-02-08T17:23:41Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Value(Scalar(0x00)) --> Const(ConstExpr { value: Bool(false), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&[database::page_index::PageIndex]'
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '[database::page_index::PageIndex]'
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type predicate '&[database::page_index::PageIndex]'
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::mir_encoder] Get operand ty move _93
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type predicate '[database::page_index::PageIndex]'
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::mir::types::encoder] Encoding of type 'Slice(database::page_index::PageIndex)' is incomplete
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::mir_encoder] Get operand ty move _94
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&[database::page_index::PageIndex; 64]'
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:23:42Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Value(Scalar(0x00)) --> Const(ConstExpr { value: Bool(false), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:23:43Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&[usize]'
[2022-02-08T17:23:43Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '[usize]'
[2022-02-08T17:23:43Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type predicate '&[usize]'
[2022-02-08T17:23:43Z DEBUG prusti_viper::encoder::mir_encoder] Get operand ty move _88
[2022-02-08T17:23:43Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type predicate '[usize]'
[2022-02-08T17:23:43Z DEBUG prusti_viper::encoder::mir::types::encoder] Encoding of type 'Slice(usize)' is incomplete
[2022-02-08T17:23:43Z DEBUG prusti_viper::encoder::mir_encoder] Get operand ty move _89
[2022-02-08T17:23:43Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&[usize; 64]'
[2022-02-08T17:23:44Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:23:44Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:23:44Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Value(Scalar(0x00)) --> Const(ConstExpr { value: Bool(false), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:23:46Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&[database::page_index::PageIndex]'
[2022-02-08T17:23:46Z DEBUG prusti_viper::encoder::mir_encoder] Get operand ty move _83
[2022-02-08T17:23:47Z DEBUG prusti_viper::encoder::mir_encoder] Get operand ty move _84
[2022-02-08T17:23:47Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&[database::page_index::PageIndex; 64]'
[2022-02-08T17:23:47Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:23:48Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:23:48Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Value(Scalar(0x00)) --> Const(ConstExpr { value: Bool(false), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:23:52Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&[usize]'
[2022-02-08T17:23:52Z DEBUG prusti_viper::encoder::mir_encoder] Get operand ty move _78
[2022-02-08T17:23:53Z DEBUG prusti_viper::encoder::mir_encoder] Get operand ty move _79
[2022-02-08T17:23:53Z DEBUG prusti_viper::encoder::mir::types::encoder] Encode type '&[usize; 64]'
[2022-02-08T17:23:54Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:23:55Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:23:56Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Value(Scalar(0x00)) --> Const(ConstExpr { value: Bool(false), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:24:03Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:24:04Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:24:07Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Value(Scalar(0x00)) --> Const(ConstExpr { value: Bool(false), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:24:18Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Unevaluated(Unevaluated { def: WithOptConstParam { did: DefId(0:49 ~ prusti_cursor[64ea]::database::MAX_DEPTH), const_param_did: None }, substs: [], promoted: None }) --> Const(ConstExpr { value: BigInt("64"), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:24:20Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:24:26Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Value(Scalar(0x00)) --> Const(ConstExpr { value: Bool(false), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:24:47Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Unevaluated(Unevaluated { def: WithOptConstParam { did: DefId(1:30555 ~ core[ea13]::num::{impl#11}::MIN), const_param_did: None }, substs: [], promoted: None }) --> Const(ConstExpr { value: BigInt("0"), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:24:52Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:25:05Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Value(Scalar(0x00)) --> Const(ConstExpr { value: Bool(false), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:25:56Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Unevaluated(Unevaluated { def: WithOptConstParam { did: DefId(0:49 ~ prusti_cursor[64ea]::database::MAX_DEPTH), const_param_did: None }, substs: [], promoted: None }) --> Const(ConstExpr { value: BigInt("64"), position: Position { line: 0, column: 0, id: 0 } })
[2022-02-08T17:26:06Z DEBUG prusti_viper::encoder::mir_encoder] subst ['_#59r]
[2022-02-08T17:26:37Z DEBUG prusti_viper::encoder::encoder] encode_const_expr Value(Scalar(0x00)) --> Const(ConstExpr { value: Bool(false), position: Position { line: 0, column: 0, id: 0 } })
warning: `prusti-cursor` (lib) generated 16 warnings
error: could not compile `prusti-cursor`; 16 warnings emitted

Caused by:
  process didn't exit successfully: `~/git/rust/pointerbender/prusti-dev/target/debug/prusti-rustc rustc --crate-name prusti_cursor --edition=2018 src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata -C embed-bitcode=no -C debuginfo=2 --cfg 'feature="prusti-cursor"' -C metadata=ffd5086999031d8e -C extra-filename=-ffd5086999031d8e --out-dir ~/git/rust/prusti-cursor/target/verify/debug/deps -C incremental=~/git/rust/prusti-cursor/target/verify/debug/incremental -L dependency=~/git/rust/prusti-cursor/target/verify/debug/deps --extern bytemuck=~/git/rust/prusti-cursor/target/verify/debug/deps/libbytemuck-ef706b80bffac625.rmeta --extern prusti_contracts=~/git/rust/prusti-cursor/target/verify/debug/deps/libprusti_contracts-7b421dc71f4e133a.rmeta` (exit status: 255)
Command exited with non-zero status 101
224.44user 15.01system 3:59.86elapsed 99%CPU (0avgtext+0avgdata 12692340maxresident)k
638104inputs+3992outputs (3382major+6464643minor)pagefaults 0swaps

The 16 warnings are all from rustc (unused import, unused function, useless comparison, and such), no warnings are generated by Prusti itself. The contents of the log messages don't stand out as immediately suspicious to me, but an interesting detail is the timestamps on the logs: there are long pauses lasting seconds in between log lines, usually the debug logs come in at a fairly fast pace (this is probably a side effect of the system struggling to allocate more memory, but just in case that is a helpful detail for the investigation).

The predicate that Prusti is encoding is shown below. For some context: Cursor is a struct that visits an in-memory B+ tree and cursor_invariant is the predicate used to record the type invariant of Cursor.

macro_rules! is_valid_page_index {
    ($cursor:expr, $page_index:expr) => {
        $page_index.transaction_id() <= $cursor.max_transaction_id
        && $page_index.page_number() < $cursor.pages.len()
        && (($page_index.is_leaf() && $page_index.len() <= $cursor.leaf_order)
            || ($page_index.is_branch()
                && !$page_index.is_empty() // branch pages may not be empty
                && $page_index.len() <= $cursor.branch_order))
    };
}

predicate! {
    pub(super) fn cursor_invariant(cursor: &Cursor) -> bool {
        // the pages invariant must hold:
        cursor.pages.invariant()
        // the tree invariant must hold:
        && cursor.tree.invariant()
        // a branch page must fit at least 2 entries
        && cursor.branch_order >= 2
        && cursor.branch_order <= isize::MAX as usize // TODO: make order bound dependend on entry size
        // a leaf page must fit at least 1 entry
        && cursor.leaf_order >= 1
        && cursor.leaf_order <= isize::MAX as usize
        // a page must fit at least one key and two [PageIndex]:
        && cursor.pages.page_size() >= 2 * PAGE_INDEX_SIZE + cursor.tree.key_size
        // a page must fit at least one key and one value:
        && cursor.pages.page_size() >= cursor.tree.key_size + cursor.tree.value_size
        && cursor.entries.len() == MAX_DEPTH
        && cursor.page_indices.len() == MAX_DEPTH
        // the current index must always be less than the capacity:
        && usize::MIN <= cursor.current && cursor.current < MAX_DEPTH
        // the depth of the path must always be less than the capacity:
        && usize::MIN <= cursor.depth && cursor.depth < MAX_DEPTH
        // the current index must always be less than or equal to the depth:
        && cursor.current <= cursor.depth
        && cursor.current < cursor.entries.len()
        && cursor.current < cursor.page_indices.len()
        && cursor.depth < cursor.entries.len()
        && cursor.depth < cursor.page_indices.len()
        // if the tree is non-empty, then all entries must be strictly less than the corresponding page length:
        && (!cursor.is_empty() ==> forall(|i: usize| (0 <= i && i <= cursor.depth) ==> (
            usize::MIN <= cursor.entries[i]
            && cursor.entries[i] <= isize::MAX as usize
            && cursor.entries[i] < cursor.page_indices[i].len()
        )))
        // otherwise, it must sit at the start of the path at a leaf:
        && (cursor.is_empty() ==> (cursor.current == 0 && cursor.entries[0] == 0 && cursor.depth == 0 && cursor.tree.root.is_leaf()))
        // all page indices in the path must be valid:
        && forall(|i: usize| (0 <= i && i <= cursor.depth) ==> (is_valid_page_index!(cursor, cursor.page_indices[i])))
        // leaf pages are only allowed at `self.depth`:
        // TODO
        // the root page index must be valid:
        && is_valid_page_index!(cursor, cursor.tree.root)
        // root of path must match root of tree:
        && cursor.page_indices[0] == cursor.tree.root
    }
}

It might be entirely possible that the memory issue was also already latently present before the mentioned commits, but that the problem became more exaggerated with all the recent additions to the code base (before this regression, encoding also took a while in release mode - but still well under a minute for ~180 items). However, I'm not sure about this, because I don't have any data yet on memory consumption from before I noticed the regression. This will also be one of the things I will look deeper into this weekend.

If anything stands out from the bug report, or if anyone has a hunch which part of the Prusti code base could contain clues to resolving this, any pointers would be greatly appreciated :)

Thanks!

Pointerbender commented 2 years ago

Note to self: since its not (yet?) possible to pin-point any specific clause of the predicate that is causing the memory problems, the root cause may be related to the number of clauses (e.g. if the whole encoder state is cloned for each clause, there may be some kind of super-linear space complexity going on). I will check for this possibility, too, unless this already rings a bell with anyone :)

fpoli commented 2 years ago

Could you attach the .dot dump of the MIR of cursor_invariant? The definition is quite big and I wonder what its CFG looks like.

fpoli commented 2 years ago

You could also give a try at some memory profiling tool, like heaptrack, just to confirm whether there are gigabytes of vir::Expr instances. That tool should (hopefully) work out of the box with the Prusti binary.

Aurel300 commented 2 years ago

Also, just in case you are not familiar with this: git bisect might make your search through the commits much faster.

Pointerbender commented 2 years ago

Could you attach the .dot dump of the MIR of cursor_invariant? The definition is quite big and I wonder what its CFG looks like.

Sure thing! There are multiple .dot files, but I believe this is the one of interest (target/verify/log/mir/prusti_cursor.database-cursor-prusti_pred_item_cursor_invariant_1cbcabab67d24e6abd150ea50721b9fb.-------.renumber.0.dot):

prusti_cursor database-cursor-prusti_pred_item_cursor_invariant_1cbcabab67d24e6abd150ea50721b9fb ------- renumber 0 dot

Thanks for the pointers to heaptrack and git bisect!

fpoli commented 2 years ago

The relevant comment is here :) https://github.com/viperproject/prusti-dev/blob/f8e70f9037b964498e83f58a3bb9387f60c7a454/prusti-viper/src/encoder/mir/pure/interpreter/mod.rs#L54-L56

The encoding of pure expressions currently generates an expression whose size is proportional to the number of paths in the CFG, which means exponential in the number of branches/clauses. (Rust's && does the short-circuit evaluation). It's this way because we initially didn't want to generate let ... in ... expressions, which we now have, and because we needed to automatically generate unfolding expressions, which is no longer necessary.

One possible fix is to detect in the following snippet if the then_expr and else_expr share a big sub-expression, and move that sub-expression to a let expression around the encoding of the switch terminator, such that shared sub-expressions do not end up being duplicated at each branch.

https://github.com/viperproject/prusti-dev/blob/f8e70f9037b964498e83f58a3bb9387f60c7a454/prusti-viper/src/encoder/mir/pure/interpreter/mod.rs#L397-L401

Another possible fix is to start wrapping every Expr type in Rc, such that cloning expressions becomes much cheaper. I remember that MIRAI did it and they were happy with the result.

A further alternative, as noted in that old comment, is to convert the encoding to a forward pass, using plenty of let expressions instead of the substitution calls. This seems to me the cleanest solution now that we have let expressions and no longer need to generate unfoldings.

Pointerbender commented 2 years ago

Wow, that's a swift find! That saves a lot of debugging, thanks! :smile:

A further alternative, as noted in that old comment, is to convert the encoding to a forward pass, using plenty of let expressions instead of the substitution calls. This seems to me the cleanest solution now that we have let expressions and no longer need to generate unfoldings.

This sounds promising!

fpoli commented 2 years ago

For the moment, splitting big predicates into multiple smaller predicates should help. Like:

predicate! {
    pub(super) fn huge(cursor: &Cursor) -> bool {
        small1(cursor) && small2(cursor) && small3(cursor)
    }
}

Disabling inline_constant_functions should also help reducing the peak memory usage of Prusti.

Pointerbender commented 2 years ago

The predicate-splitting work-around seems to do the trick, thanks @fpoli!

viperproject / prusti-dev

Memory crash during Prusti encoding step #854