Advise on verifying state machines with kani

LizardWizzard commented 6 months ago

Hi! Thank you for developing such a great tool!

This is not a request for particular feature. I tried to find a place where to ask questions on kani and I didnt find either of zulip/discord/gh discussions set up so I'm asking here.

I'm trying to verify a protocol which is represented by a number of instances of one state machine. I believe my problem is that the state of this state machine includes collections such as hashmaps, sets and vectors which make verification using kani quite complicated. I was able to move forward past initial struggles by replacing all hash based structures by their emulation on vectors (most of it is based on vector_map crate from one of your examples). I still get poor verification performance (understandably). Loop unwinding can take really long time (I got up to an hour) and even when it succeeds cmbc fails with out of memory error. Judging by logs most of the unwinding takes place on some vector internals (std alloc, raw vec, std ptr, etc). From what I can see unwinding process doesnt make any benefits from setting -j argument. I tried to tweak unwinding strategy and I couldnt find a sweet spot. Either it fails with an error that harness reached an iteration that was not reached during unwinding or unwinding takes long time with a following cbmc OOM.

What would be the best way to proceed? Is there a way I could replace Vec from std with a stub? In the blog post about stubbing I see only particular functions being replaced and in general I'm not sure what can be the shape of contract that can replace an entire vec. Would be interested in any advice on moving this forward. Thanks in advance!

Worth noting I dont have any relevant unsafe code so if disabling some safety related verification targets can help that would be good. Idea I have now is to replace vectors with fixed size array of Option<T> so it doesnt have to go deep into most of alloc/ptr internals. Not sure if it is a good idea, but thats what I have left

zhassan-aws commented 6 months ago

Hi @LizardWizzard. Thanks for trying out Kani!

Vectors can indeed choke Kani's solver. The general strategies I use to address performance issues are:

Try to reduce the amount of non-determinism (introduced using kani::any()). Perhaps even try to eliminate non-determinism entirely, replacing all kani::any() calls with constants, and see if you can get it to converge. Then start introducing non-determinism back to identify where non-determinism is hurting performance.
Try to reduce the input size. For example, if the harness generates non-deterministic vectors of size 10, see if cutting down the size can help. Similarly, if the inputs are 64-bit wide, see if reducing their widths to 8 bits helps.
Try stubbing. Unfortunately, stubbing types (like Vec) isn't supported, but perhaps check if it's possible to stub other functions in the call graph that may be contributing to the poor performance.

Is it possible to share the code? We may be able to suggest workarounds or identify improvements we can make to Kani/CBMC.

LizardWizzard commented 6 months ago

Thanks for your response @zhassan-aws!

Yep, I'm trying with constant input without any calls to kani::any, though I'm hoping to use kani::any in the future
As far as I can see that shouldnt be applicable with constant input
Is there a way I can see which function generates the most hassle for kani? Something that looks like cargo-llvm-lines would've been ideal. For now the only option I see is to inspect unwinding logs and factor out most popular function. Is there a better way? Maybe I can contribute something in that regard if the scope looks manageable

Is it possible to share the code? We may be able to suggest workarounds or identify improvements we can make to Kani/CBMC.

Yep, I'm currently in progress of moving everything to arrayvec based collections hoping that it'll make things better, once I finish I'll post an update here with a repo.

Sadly full state machine definition is quite large, all code that can be reached by the harness is above 1k loc (with handrolled fixed size array based collections even more), so maybe thats just too big for kani to chew at once, though I'm ok if the process can eventually succeed even if it takes very long time

zhassan-aws commented 6 months ago

Is there a way I can see which function generates the most hassle for kani?

You can pass:

--enable-unstable --cbmc-args --write-solver-stats-to <NAME OF JSON FILE>.json

to dump a statistics log from CBMC. I'm not sure how reliable/helpul it is at the moment though.

LizardWizzard commented 6 months ago

I finished transition to arrayvec based maps and sets, however that only lead to cbmc crash caused by internal invariant violation. Here is the error I've got:

Unwinding loop _RNvMNtNtCsd3BvQT1zweK_17accord_playground8protocol7harnessNtB2_7Harness3run.0 iteration 4 file src/protocol/mod.rs line 159 column 33 function protocol::harness::Harness::run thread 0
Unwinding loop _RNvMNtNtCsd3BvQT1zweK_17accord_playground8protocol7harnessNtB2_7Harness3run.0 iteration 5 file src/protocol/mod.rs line 159 column 33 function protocol::harness::Harness::run thread 0
Unwinding loop _RNvMNtNtCsd3BvQT1zweK_17accord_playground8protocol7harnessNtB2_7Harness3run.0 iteration 6 file src/protocol/mod.rs line 159 column 33 function protocol::harness::Harness::run thread 0
--- begin invariant violation report ---
Invariant check failed
File: /tmp/tmp.uBRU9YPKKn/src/util/simplify_utils.cpp:345 function: bits2expr
Condition: el_size_opt.has_value() && *el_size_opt > 0
Reason: Check return value
Backtrace:
cbmc(+0xa16f7e) [0x55a138216f7e]
cbmc(+0xa17f0a) [0x55a138217f0a]
cbmc(+0x69abf) [0x55a137869abf]
cbmc(+0xac1743) [0x55a1382c1743]
cbmc(+0xabff5f) [0x55a1382bff5f]
cbmc(+0xac0d4a) [0x55a1382c0d4a]
cbmc(+0xa882bd) [0x55a1382882bd]
cbmc(+0xa8702f) [0x55a13828702f]
cbmc(+0xabd0b2) [0x55a1382bd0b2]
cbmc(+0xa85127) [0x55a138285127]
cbmc(+0xa85feb) [0x55a138285feb]
cbmc(+0xa8a69a) [0x55a13828a69a]
cbmc(+0xa8a7b9) [0x55a13828a7b9]
cbmc(+0x6375c9) [0x55a137e375c9]
cbmc(+0x637636) [0x55a137e37636]
cbmc(+0x637636) [0x55a137e37636]
cbmc(+0x637636) [0x55a137e37636]
cbmc(+0x637636) [0x55a137e37636]
cbmc(+0x32dd1d) [0x55a137b2dd1d]
cbmc(+0x3e18ec) [0x55a137be18ec]
cbmc(+0x3e1dd5) [0x55a137be1dd5]
cbmc(+0x3e267a) [0x55a137be267a]
cbmc(+0x3921ea) [0x55a137b921ea]
cbmc(+0x395c6e) [0x55a137b95c6e]
cbmc(+0x3980c2) [0x55a137b980c2]
cbmc(+0x39627e) [0x55a137b9627e]
cbmc(+0x3292fc) [0x55a137b292fc]
cbmc(+0x3c5ff3) [0x55a137bc5ff3]
cbmc(+0x3c6463) [0x55a137bc6463]
cbmc(+0x1ece0e) [0x55a1379ece0e]
cbmc(+0x3c64b4) [0x55a137bc64b4]
cbmc(+0x3c6755) [0x55a137bc6755]
cbmc(+0x3c73dc) [0x55a137bc73dc]
cbmc(+0x1c26d3) [0x55a1379c26d3]
cbmc(+0x1c1c4b) [0x55a1379c1c4b]
cbmc(+0x72c50) [0x55a137872c50]
cbmc(+0x7089f) [0x55a13787089f]
cbmc(+0x67a58) [0x55a137867a58]
cbmc(+0x53c34) [0x55a137853c34]
/lib64/libc.so.6(+0x27b8a) [0x7f1781768b8a]
/lib64/libc.so.6(__libc_start_main+0x8b) [0x7f1781768c4b]
cbmc(+0x696fa) [0x55a1378696fa]

--- end invariant violation report ---
Unwinding loop _RNvMNtNtCsd3BvQT1zweK_17accord_playground8protocol7harnessNtB2_7Harness3run.5 iteration 1 file src/protocol/mod.rs line 230 column 33 function protocol::harness::Harness::run thread 0

CBMC failed with status 134
VERIFICATION:- FAILED

Summary:
Verification failed for - protocol::verification::one_transaction
Complete - 0 successfully verified harnesses, 1 failures, 1 total.

Failure happened after 11 minutes and ~65G memory used by cbmc.

The exact commit this is happened on: https://github.com/LizardWizzard/accord-playground/commit/327cc59bf4109e98b796fbf268c4b4eec926077e

Command I used to run kani:

cargo kani --harness one_transaction --features verification --enable-unstable --cbmc-args --write-solver-stats-to one_transaction_stat.json

Kani version is 0.45.0.

Worth noting that at this point code doesnt pass unit tests with known error. So I'd expect kani to report that.

Cbmc generated a coredump, though it is truncated because heap size was huge at the moment of failure. Is there a way kani can use custom cbmc binary? I can try to build cbmc with debug symbols so backtrace is more helpful

I'd appreciate any help/advice on how to move this forward

zhassan-aws commented 6 months ago

I suspect the crash is resulting from some overflow as indicated by the extremely high memory usage. I doubt CBMC is able to produce any meaningful result with such high memory usage.

Thanks for sending the code. I'll take a look at it and see if anything can be done.

model-checking / kani

Advise on verifying state machines with kani #3016