Type checker: public class instance variables cannot be accessed from expressions

nlmave commented 9 years ago

The following specification gives a type error while imho it should be allowed

class A
instance variables
  public x : nat
end A

class B
types
  sa = seq of A
  inv sai == forall i in set inds sai & sai(i).x < 10
functions
  getX: A -> nat
  getX(pA) == pA.x
operations
  getOpX: A ==> nat
  getOpX(pA) == return pA.x
end B

The above example produces two errors (in the invariant and in the getX function). Tested on Overture 2.2.6 and VDMTools (which does NOT produce these errors).

nlmave commented 9 years ago

Incidentally, the error is produced in both VDM-10 and classic.

nickbattle commented 9 years ago

I've only looked at this briefly...

The expression in the type invariant is the body of a function (inv_sa: seq of A +> bool), and the getX case is clearly a function. You cannot access state values from a function in VDMJ. If this was possible (in general), the invariant decision for a value (even a constant) might change as the state of the system changed, or a function might return different values depending on the state of the system rather than just its arguments.

For example, you might add a seq of three objects to a variable of type "sa" and that might be fine if their state values were all <10. But then later you could change the value of one of the instance variables to break the invariant. So are we saying that any value of type "sa" is implicitly variable - so even a constant of that type can change value, and we re-compute the invariant as its values change?

I agree the example looks like it "ought to work", but we need to be careful to understand the ramifications of simply enabling this. If you change A to be a record type with a field called "x", then it is all fine. But record values are copied by value and their fields are not state,

If this is a new aspect of RM#27, now would be a good time to explore it.

nlmave commented 9 years ago

Indeed this is related to RM https://github.com/overturetool/language/issues/27; and the original intent of the small case study was actually to explore the evaluation of invariants over sequences of class instances. PGL pointed me in this direction but I failed to explore it as I already ran into the access issue. Apart from the deviation from VDMTools, I believe that the VDMJ interpretation of "not allowed access to state variables" is far too strict. IMHO it is clear that the intent of the public instance variable in class A is to have a record-like semantics (with a kind of (implicitly pure) getter operation). And yes, the invariant should IMHO be checked if the values of the class instance variable changes (as it would in case type A is a record type) - you would expect similar behavior here! As accessing a public instance variable in a class instance is pure, there is no reason IMHO why it should be restricted inside a function body.

pglvdm commented 9 years ago

I fully agree with Marcel here, but in general I prefer NOT using public for instance variables unless strongly needed (as in a system class) so that is why it is related to RM#27 but note that fixing this bug does not remove the need for RM#27.

nickbattle commented 9 years ago

Can I just make sure we're clear about the difference here between object references as "records", and "mk_R" records. The former are references and so the "record" they refer to can be updated by anyone holding that reference; the latter are pure values, so you cannot update them, you can only generate a new value based on the old one. I know a state variable of a record type can be updated, but you cannot get a reference to that state value and put it somewhere else.

So the problem is that an object reference value can leak into the system (ie. an alias for the same data) and be used to modify the value independently. This is not possible with record values because the value cannot be aliased.

The following spec executes without error on VDMTools, returning broken data, even though the tweak operation has undermined the invariant on the "data:T". This is because there has been no assignment to the "data" variable as such - the data was corrupted via the tweaked reference.

class A
instance variables
    public x:nat := 0
end A

class B
types
    T = seq of A
    inv t == forall a in set elems t & a.x < 10;

instance variables
    data : T := [ new A(), new A() ];

operations
    public op: () ==> seq of A
    op() ==
    (
        let objref = data(1) in
            tweak(objref, 123);

        return data
    );

    private tweak: A * nat ==> ()
    tweak(a, n) == a.x := n;

end B

Initializing specification ... done
>> p new B().op()
[ objref4(A):
    <  + A`x = 123 >,
  objref5(A):
    <  + A`x = 0 > ]

nlmave commented 9 years ago

I don't understand your point on immutability of records, as shown below (which gives a run-time error (on a type invariant violation, as you would expect)) when I modify the record in the sequence:

class B
types
    A :: x : nat;

    as = seq of A
    inv asi == forall i,j in set inds asi & i<j => asi(i).x < asi(j).x

instance variables
    y : as := [mk_A(1), mk_A(2)]

operations
    public modifY: () ==> ()
    modifY () == y(1).x := 10
end B

W.r.t. your point on object reference leakage, even though the reference is copied, they still refer to the same object. So I guess we are talking 1:1 in the mk_R case and 1:m in the object reference case (m>=1). This probably requires some kind of listener solution in the interpreter, which is triggered as soon as the object becomes "dirty" because of state manipulation through one of the object references to that object.

nickbattle commented 9 years ago

When you say "y(1).x := 10" the LHS state designator unpicks the location of the field "x" of the appropriate element in the "y" sequence, and that value is updated with the value 10. The only way to refer to that field is via the "y" sequence. It is not possible to create an alias for either the record or the field and update the value "via the back door"; you must use the "y" name. This is different with objects.

Perhaps I'm confusing things by talking about immutable values and aliasing in the same sentence, though they are related. Records are passed by value, without aliases, and (outside of state) they are immutable; objects are passed by reference (creating aliases) and they are always mutable.

Type and state invariants are implemented by listeners, currently. It might be possible to propagate the invariant listener to all of an object's fields and its fields' fields and so on whenever an object is added to a type that has an invariant. But without analysing the fields used by an invariant expression (which could call functions, and is generally impossible?) it is difficult to contain this propagation. By contrast, if a record value is "copied in" to an invariant type rather than being there by reference, the listener propagation is naturally limited.

nickbattle commented 9 years ago

I suspect that a harder problem than when to propagate invariant checks is when to remove them again - such as trimming the sequence in the example above.

When an object reference is added to the scope of a value with an invariant, we might be able to add listeners to its fields (recursively) to trigger the invariant calculation when they change. But we would also need to know when an object reference leaves such a scope (and of course it could be in the type's scope multiple times, and subfield references could legitimately be in the scope already). If we get the cleanup wrong, the interpreter will be executing more and more pointless invariant checks, since the objects that provoked it are no longer within the scope of the test (or we will accidentally remove the listener, when in fact it ought to remain since the reference is being used elsewhere for an invariant of the same type).

Again, with records that are copied by value, this is not an issue. When a value is removed, it disappears. Of course, just because something is difficult doesn't mean it's wrong. But I suspect this is the root of why VDMTools does not (seem to) do this.

nlmave commented 9 years ago

Well I never claimed it would be easy to implement :-). Nevertheless this sounds like reference counting and garbage collecting techniques to me.

nickbattle commented 9 years ago

Yes agreed, this is all about reference counting and garbage. Garbage collection (or lack of it) is another area of VDM++ that is poorly defined.

I would not attempt to implement a reference counting "invariant tracker" lightly. Its complexity is so great that it would threaten to destabilize the tool for little benefit. It may be that VDMTools came to the same conclusion.

It may be possible to allow public field access from functions with less impact, though we do have to be careful about referential transparency. We can argue that "passed" objects aren't really state, though it might be best to disallow public static field access from functions (ie. where the return value then does not depend on the arguments). We should also think about function evaluation being atomic, since an object's fields could change value between being passed and being used, if there are multiple threads (breaking referential transparency again).

Field access only via "pure" operations would be preferable?

Still not convinced this is a great idea.

nlmave commented 9 years ago

semantically speaking there is (IMHO) no difference between accessing a field directly or through a getter operation that is defined pure. they are (or should be) one and the same.

the current debate is too much about the implementation while it should be driven by the specifiers' need to create invariants over groups of class instances (like in the simple example I provided). Currently, these are hard if not impossible to specify.

nickbattle commented 9 years ago

Absolutely agree that we should be trying to support what a specifier needs. What I'm trying to avoid is us bending the language in such a way that we forever lose such (non-trivial) properties as referentially transparent functions or analytically tractable invariants, just because it would be "cool" to be able to do . When I hear you two enthusiastically suggesting that functions should be able to access state without mentioning these things, I get nervous.

I don't really know how VDM++ became the way it is, but the result does look as though it was designed by people who were more concerned with OO features than formal integrity, and we should resist adding anything that makes that worse!

nlmave commented 9 years ago

We agree on the need to reconsider the VDM++ semantics as we already concluded several workshops ago in Newcastle, but unfortunately we have not made progress. Your concerns are valid in the generic context but the issue I raised is visibility of public instance variables in invariants (and those constructs only) which should be evaluated atomically anyway and hence always guarantee ref integrity. Op 26 jun. 2015 18:51 schreef Nick Battle notifications@github.com:Absolutely agree that we should be trying to support what a specifier needs. What I'm trying to avoid is us bending the language in such a way that we forever lose such (non-trivial) properties as referentially transparent functions or analytically tractable invariants, just because it would be "cool" to be able to do . When I hear you two enthusiastically suggesting that functions should be able to access state without mentioning these things, I get nervous.

I don't really know how VDM++ became the way it is, but the result does look as though it was designed by people who were more concerned with OO features than formal integrity, and we should resist adding anything that makes that worse!

—Reply to this email directly or view it on GitHub.

nickbattle commented 9 years ago

So you don't want public field access to be permitted in functions generally, just in type invariant expressions?

Invariants are evaluated atomically, but I think referential integrity is also a matter of whether the invariant always gives the same answer for the same candidate type value. If we allow it to calculate using arbitrary state data, that would be violated, and another lump of the proof theory would go out the window. So perhaps you mean that an invariant function can only access (nested) public fields of its argument (the value being tested for membership) rather than allowing arbitrary public field access (such as public static fields)?

That would give the type checker a headache, but it's possible I think. It blurs the notion of whether instance variables are "state". I'm more concerned about whether this undermines anything else, though I can't see anything at the moment.

VDMTools allows public static fields to influence type invariants, but this seems wrong to me. Updates to the static from elsewhere don't cause invariant violations, as we've seen with regular object updates, probably for the reasons discussed (ie. tracking references is too hard).

nickbattle commented 9 years ago

So, assuming this is a general feature of functions and not a "special" for type invariants, I think we have the following proposal:

Functions can access public instance variables of object references passed to them as arguments (and recursively, public instance variables of those variables, if they are object references). Presumably the usual access rules apply, so a function in class C could also access private instance variables of object references of type C, and so on for protected variables in a hierarchy.
Functions still cannot access state data directly, either from public static instance variables, or "self" (since they have no self).
Function execution is always instantaneous (in RT terms) and atomic (no thread swaps), so that objects' variables cannot change during the evaluation.
This seems compatible with the RM#27 proposal to allow pure operation calls.

And separately, updates to object instance variables via aliases should cause a re-evaluation of any type invariants that have those objects in scope (and that depend on the object's fields). This would be the intention of the language semantics, but the tools are unlikely to implement this as it is very complex.

How does that sound?

peterwvj commented 9 years ago

I am happy with what you propose, Nick - I think it sounds reasonable to allow functions to access instance variables of arguments (recursively). Essentially this should have the same effect as invoking pure operations so I believe this is fine.

nickbattle commented 9 years ago

This is now implemented in ncb/development. It should probably be regarded as a test version before merging into trunk, but the test specification given in this thread seems to behave as expected.

So as discussed, this now allows functions (in general - not just type invariants, but pre/postconditions and general functions) to access instance variables of arguments passed to them. Functions still cannot access arbitrary state (ie. public static data) unless an object of the type was passed to the function. Function evaluation has now become atomic (instantaneous), to avoid state values changing on other threads while the evaluation proceeds.

The re-evaluation of type invariants after arbitrary object changes via aliases is not implemented (neither is it in VDMTools, or the equivalent in JML).

nickbattle commented 9 years ago

There is a problem: object values are not triggering invariants when their fields are changed. This is because they've never had to do so in the past and I didn't change it! I've made some tentative changes in VDMJ, which I'm trying out. It seems to work, but this is creeping slowly towards the really complicated case where we have reference counting and so on. So this will have to be a "partial" solution for now (top level instance variable changes will be noticed, but not changes to sub-sub-fields of embedded objects). When it looks stable, I'll port the tweak over to Overture.

nickbattle commented 9 years ago

Tweak applied. I think it fixes the issue without breaking anything else.

overturetool / overture

Type checker: public class instance variables cannot be accessed from expressions #454