PR #920 needs some fixes (undefined behavior was overlooked, fixed task simplified the program too much)

sosy-lab / sv-benchmarks

Collection of Verification Tasks (MOVED, please follow the link)

https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks

184 stars 169 forks source link

PR #920 needs some fixes (undefined behavior was overlooked, fixed task simplified the program too much) #1105

Open MartinSpiessl opened 4 years ago

MartinSpiessl commented 4 years ago

As described by me in https://github.com/sosy-lab/sv-benchmarks/pull/920#issuecomment-675283861

I see some problems with this PR:

The corrected version of the program just sets the divisor to 1 instead of the previously non-deterministic value, this is hardly a fix since it does not preserve the original problem complexity ( returning the integer division "A//B" for non-deterministic values of A and B).

The counterexample provided here actually contains an overflow, i.e., undefined behavior. As such, simply setting the verdict to false is not the right thing to do. Instead the value ranges should be limited such that no overflow can occur or appropriate overflow checks should be introduced (GCC has functions for that). The original task can be kept with the overflow verdict set to true and all other verdicts removed.

divyeshunadkat commented 4 years ago

@MartinSpiessl Thank you for the mentioned points.

While the corrected version of the program (hard2.c - that sets the divisor to 1 instead of a previous non-deterministic value) does not preserve the complexity of the original program, several tools are unable to very it! Since it can distinguish between tools, I think we can retain the program (hard2.c), even if it is of lower complexity. I believe that I had tried to add guards before I submitted #920 but couldn't manage to get an appropriate correct version of the program before the competition deadline, hence this route. I would also like to vote for keeping the original program (hard.c) with a reachability verdict set to false as well (apart from setting overflow verdict to true as requested), again since it helps distinguish tools.

In a separate pull request we can : update hard.yml and add the overflow verdict set to true (apart from retaining the reachability verdict set to false). try to add a new benchmark with appropriate guards against the overflows with the reachability verdict set to true and overflow verdict set to false.

MartinSpiessl commented 4 years ago

Since it can distinguish between tools, I think we can retain the program (hard2.c), even if it is of lower complexity

Yeah sure, there is no harm in retaining a program!

I would also like to vote for keeping the original program (hard.c) with a reachability verdict set to false as well (apart from setting overflow verdict to true as requested)

This is not a good idea, since we cannot reason about reachability behavior of a program with undefined behavior. But we could fix the overflow in hard.c (e.g. using unsigned variables does immediately get rid of undefined behavior!) and keep that one with the reachability label set to false. The original hard.c gets overflow verdict set to true and all other properties removed. Then in addition a version of hard.c would be nice where the overflows are fixed and the program complexity is restored.

divyeshunadkat commented 4 years ago

@MartinSpiessl The commit b3fb86c in PR #1155 fixes the overflow, as well as wrapping around of the values of program variables. Hence in commit 7b81d2a I have changed the reachability verdict to true, thus restoring the program complexity such that the assertions in the program hold. We can close this issue as well as the corresponding PR once reviewers agree to the changes.

I believe that fix can be applied to most other benchmarks in nla-digibench (#1166 ) as well as other benchmarks with overflow (#1159 ).