[CN] Semantics of `unchanged` is confusing (eg. local vars, changed vars)

septract commented 1 month ago

(EDIT 2024-7-30: see below for some more commentary and thoughts about how we could make unchanged better)

CN requires that loop invariants state which variables are unchanged, but this doesn't work for local variables. Eg it seems like this should work:

// Write to a local variable outside a loop, then use the value inside the loop. 
// Currently does not work. 

void loop_9 ()
{
  int test = 10; 
  for (int i = 0; i < 2; i++ ) 
  /*@ 
    inv 
      {test} unchanged; 
  @*/
  { 
    if (test != 10 ) {
      assert(0); 
    }
  }
}

... but it gives the following error message:

$ cn verify loop_9.c
loop_9.c:10:8: error: Cannot resolve the value of &test in evaluation scope 'start'. Is the ownership for accessing &test missing?
      {test} unchanged; 
       ^

Going by the error message, perhaps unchanged is asserting that the value of the variable is the same as it was at function initialization? In that case, test has not been declared at this point.

The following variant that constrains the value of test does pass CN:

void loop_9_alt ()
{
  int test = 10; 
  for (int i = 0; i < 2; i++ ) 
  /*@ 
    inv 
      test == 10i32; 
  @*/
  { 
    if (test != 10 ) {
      assert(0); 
    }
  }
}

CN version: git-b3d7e0463 [2024-07-26 17:41:55 -0700]

septract commented 1 month ago

This also does not verify:

void loop_10 (int test)
{
  test = 10; 

  for (int i = 0; i < 2; i++ ) 
  /*@ 
    inv 
      {test} unchanged; 
  @*/
  { 
    ; 
  }

  if (test != 10 ) {
    assert(0); 
  }
}

Error:

$ cn verify inprogress/loop_10.c
[1/1]: loop_10
inprogress/loop_10.c:6:3: error: Unprovable constraint
  for (int i = 0; i < 2; i++ ) 
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
Constraint from inprogress/loop_10.c:9:7:
      {test} unchanged; 
      ^~~~~~~~~~~~~~~~~

peterohanley commented 1 month ago

related, CN seems confused about the available resources:

void loop_9 ()
{
  int test = 10;
  for (int i = 0; i < 2; i++ )
  /*@
    inv
      {*&test} unchanged;
  @*/
  {
    if (test != 10 ) {
      assert(0);
    }
  }
}

% cn verify for_loop.c
for_loop.c:8:8: error: Cannot dereference '&test' in current scope. Is the necessary ownership missing?
      {*&test} unchanged;

Available resources
Owned<signed int>(&test)(O_test0)
...

septract commented 1 month ago

I've been debugging examples with our friends at BAE, and I'm increasingly convinced that unchanged as it currently exists in CN is extremely confusing.

What I (and I think they) incorrectly assumed {x} unchanged in a loop invariant meant:

x does not change in one iteration of this loop

What I now think {x} unchanged actually means:

x has the same value at this program point as when the function was called

This has some confusing consequences:

You can't write unchanged for local variables (they weren't declared at call-time)
You can't modify a variable, and then write unchanged for a later / inner loop.

We also don't have an easy way to say that some variable x is the same as it was at some previous program point, unless it's the same as it was at the call-site.

Suggested resolution(s)

Here are some ideas for mitigating this problem:

Change the keyword to something less confusing. It's just very natural to assume unchanged in a loop iteration refers to this loop iteration.
Better, in my view: expose the notion of "x at the current program point", then let the user write x_requires == x_current (or whatever)
Automatically detect unchanged variables and remove unchanged completely

Additionally, we should allow the user to write "x is unchanged in this loop iteration" via some syntax - {x} loop_unchanged. Again, ideally this would be just a wrapper around some lower-level notion, eg. x_prev == x_current.

lwli11 commented 1 month ago

Yes, accurate summary of our confusion! It also seems 'unchanged' would be most useful in the pre-postcondition.

cp526 commented 1 month ago

I'd separately been thinking about some of these. We should indeed revisit the frontend specification language to address these problems. To summarise again, there's at least three problems:

There is not a good mechanism for specification to refer to the state of program values in previous program positions. We have {...}@start and {...}@end, but that is insufficient.
unchanged is confusing, as described above.
The fact that, unless stated otherwise, variables mentioned in the postcondition or loop invariant refer to the current, possibly-mutated state can lead to confusing errors, because the specification is weaker than intended, as @septract pointed out previously.

A combination of the following is needed:

I agree, the basic mechanism to talk about mutation should be to allow users to refer to old states of a variable. Which old states of a variable should be considered to be in scope for specifications? The pre-state should be in scope for loop invariants and postconditions (as it is now); loop invariants should probably be able to refer to the state before the first loop iteration (for instance to say that the loop doesn't change it). On the other hand, the value of variable in the previous loop iteration should not be in scope (in the first loop iteration there isn't a previous one). There's at least two ways to do this:
- CN specifications are already required to introduce names for most Owned variables, so those are names for explicit "snapshots" of a variable that can be used in specifications. One could use those, perhaps with a way of letting users introduce additional snapshots, for instance using inline asserts for ownership, as previously suggested by Dhruv and then Santiago in #260.
- We could borrow the old notation from Frama-C's (welltested) specification language. In any case we should survey theirs and other tools before we decide on a scheme.
Would implementing whatever scheme we choose require changing Core/Mucore to maintain the loop structure of the original C program rather than compile these to GOTO?

Beyond user-written specifications, CN error reports also have to be extended to keep track of the state of program variables at previous program points, and these should be displayed consistently with what syntax we adopt in specifications.
As @septract previously suggested, we should probably make CN consider local variables unchanged by default, with a keyword for permitting mutation or some analsysis detecting this automatically. This would avoid the confusion of problem (3) above and could be done purely as a change to CN's frontend.
The unchanged keyword is currently used almost exclusively for two purposes: asserting that mutable local variables are unchanged, and read-only use of global variables. Probably CN function specifications should support a keyword maintains, for ownership that is returned unchanged: instead of the common specification pattern requires take v1 = Pred(p); ensures take v2 = Pred(p); v1 == v2 one would use maintainsv = Pred(p)as a shorthand. For the case of globals, instead ofaccesses g; ...; ensures {g} unchanged, one could then use the shorthandmaintains take v = Owned(&g), which is (a) shorter, (b) gives a name to the value that can be used in error reports, and (c) can be used uniformly for global and local variables (unlikeaccesses`) and works uniformly for Owned and other predicates.
Taking (2) and (3) together, those should account for almost all current uses of unchanged, and we might as well just retire unchanged (as you suggest, AIUI) and avoid its confusing ambiguity in loops. How much of (1) would we need if we had (2) and (3)?

rems-project / cerberus

[CN] Semantics of `unchanged` is confusing (eg. local vars, changed vars) #443