overturetool / language

Overture Language Board issue tracking
2 stars 0 forks source link

"Pure" operations called in functions #27

Closed joey-coleman closed 8 years ago

joey-coleman commented 9 years ago

The following bug was originally reported on Sourceforge by ldcouto, 2014-07-28 06:14:14.783000:

1 Identification of the Originator Luis Diogo Couto

2 Target of the request: defining the affected components of the language Introduce "pure" operations and allow them to be called inside functions. Pure operations do not affect the state of any object.

3 Motivation for the request When using an object as a function parameter, it is often necessary to query the internal state of the object. A typical example is when the object is used as an operation parameter and its internal state needs to be inspected in the pre condition.

The only way to do that at the moment is to make the relevant instance variables public, which breaks encapsulation.

4 Description of the request, including: (a) description of the modification: Allow pure operations to be called inside functions. A pure operation does not affect the state of its object nor of any other. It is referentially transparent.

Pure operations can only read instance variables and call functions and other pure operations.

A new keyword "pure" can be introduced to identify these operations.

Another option is to use the externals clause. An operation that contains ext rd x is pure. Currently such an operation can call any operations on x so the semantics of ext rd would be altered.

(b) benefits from the modification: When using an object as a function parameter it would be possible to check (or get) data that depends on the internal state of the object. This could be done without exposing the internal state of the object, which is an important principle of OO design.

This would be particularly helpful if we want to use objects in pre/post conditions and invariants, which are signature features of VDM. Pure operations would be particularly suited to implement the checks that these conditions rely on.

(c) possible side effects: If the new keyword is introduced, current models that use it as a name will no longer be correct. For what it's worth, none of the overture examples use the word pure as a name.

If the semantics of ext rd are altered, models that use it may no longer be correct. 11 examples use ext rd, though I have not checked the bodies of those operations.

5 Source code and technical documentation where applicable Documentation: new keyword or updated semantics of ext rd need to explained in the language manual.

Source code: the new keyword would require changes to the parser. Both versions would require changes to the type checker.

6 A test suite for validation of the modification in the executable models. N/A

joey-coleman commented 9 years ago

Comment by nick_battle, 2014-07-28 17:37:19.626000:

Some initial thoughts...

Would this proposal ever lead to a situation where a function could be called with the same arguments, but return different results (for example, by calling a pure getter of some state value)?

If so, do you regard the "value" of the references passed in to comprise the set of all state values reachable by calling its pure operations (so that if you call a function with the same "complete object value", you would get the same result)?

Does this change imply function evaluation is instantaneous? If not, an object's value can be changed by another thread mid-evaluation, and so it is not a constant value throughout the function. Purity could be extended to mean "mutex with all operations that can change the state", but this is not specified.

If this is to include mutex constraints on the meaning of purity, does that mean public (or public static) field access is always impure (since they can be accessed from operations that cannot be mutex with the pure operation)?

joey-coleman commented 9 years ago

Comment by ldcouto, 2014-07-28 19:33:18.783000:

Regarding the first question: what exactly do you mean by same arguments? If you pass the "same" object (same internal state) then yes you should always get the same results. I suppose this leads us down a rabbit whole of encapsulation and object values...

But yes, in general the value of a reference should be all the state values that can be gotten with pure operations. Is this problematic in terms of deep cloning for the interpreter?

To my mind, function evaluation is atomic/instantaneous. What is the current interpretation in VDM? Is there a document I can refer?

By allowing pure operations to be invoked inside functions they would also need to be atomic. And yes, that would preclude the usage of public static fields.

nickbattle commented 9 years ago

It might be the case that we can define what an "object value" is, and then get repeatable function results, but we have to be clear about how much of the environment is included.

I'm not so concerned about what is difficult for the tools. We need to be clear about the semantics of the language firstly. It may be that tools struggle to implement that, or perhaps can only interpret models in certain constrained cases (we already have this in some places, and VDMJ will give warnings if you write something that it cannot interpret).

I'm not sure whether function evaluation in VDM++ (or VDM-SL come to that) is strictly instantaneous. Since functions cannot read state at the moment, it makes no difference - you will get the same answer eventually, even if other threads have been scheduled in the mean time. With VDM-RT, functions do currently "take time" unless they are invoked from invariants or pre/postconditions. These uses are instantaneous (or at least "atomic" - meaning there is no RT time movement and no thread scheduling). I think your pure operations proposal has to imply instantaneous functions unless you also require mutex constraints of some sort to guarantee that no other thread changes the state that the pure operations read.

PGL has already suggested that instantaneous function invocation might be a solution to the problem of object access from functions (this came up in the discussion of object patterns). In effect, it is saying that the "value" passed to the function is the entire environment accessible from the object reference passed as an argument. If the environment is the same, the function result will be the same. This has some appeal, but we have not thought through all the consequences.

The solution to all this should also allow us to define the signature and semantics of the pre_op and post_op functions for VDM++. Currently these are undefined, because we do not know how to pass the "old" and "new" states of the system to the functions. In VDM-SL this is easy, by comparison, and the entire state record is defined as a type.

Incidentally, did you intend your pure operations to be callable from functions in VDM-SL?

ldcouto commented 9 years ago

Hey Nick,

Your point about semantics ahead of tools is well made. Though I do feel that in language design, it is sometimes good to make some concessions to tooling issues if they make it significantly easier (or even possible) for the tools to do their job.

I think instantaneous functions make sense. They don't affect the state in any way and they are just an expression really. Is expression evaluation also not atomic?

My main goal for this RM is to allow pre/post conditions and invariants to make assertions on objects. I really do think this is important. Pure operations would perhaps allow smaller values to be passed to the function since we will control which parts of the accessible environment to send. We can also place additional restrictions on pure ops, if you feel they are appropriate (perhaps disallowing certain kinds of statements, such as duration).

On pre and post_op, I agree. But there is also a related issue of inheritance. If I use x.pre_op(), am I referring specifically to the op in the declared class of xor one of its subclasses (which, at the moment are allowed to redefine pre/posts however they want)? These things get extra-thorny for static checking such as the POG.

I don't think the pure operations are needed in SL. There's no objects and you can already pass whatever parts of the state you want as parameters to a function.

Also, congratulations on moving to GitHub. Welcome to civilised society :)

kgpierce commented 9 years ago

The LB officially moved this RM into the consideration phase. LB members have an action to join the discussion here.

tomooda commented 9 years ago

Luis, I think your idea to allow pre/post expressions to call "side-effects free" operations is useful. Purity might not be exactly what you need. If you only need to tag side-effects free operations, const, from C++ vocabulary, might be the one. It does not require the same result given the same object ref. It just says the operation doesn't assign to any variables.

ldcouto commented 9 years ago

Does const allow other non-const operations to be called?

If it doesn't, I suppose it works as well as pure for this purpose.

tomooda commented 9 years ago

In C++, you can't call non-const functions in a const function besides there is an awful trick to cast a non-const function to const one. (yes, it's dangerous!) The point is that you don't have to re-define equality and inequality between object references. Instead, your accessor operations promise that they don't change anything. I think most part of your definition of "pure" can be reused for definition of "const". How about this?

  • An operation may be declared as const only if (1) the operation does NOT contain any assignment statement, and (2) the operation does NOT call any function that is not declared as const.
  • Pre/post conditions and invariants can't call operations except those declared as const.

There is a drawback. I'm not supportive to allowing functions to call const operations because most "meaningful" const operations would be impure.

ldcouto commented 9 years ago

Looks good to me.

nlmave commented 9 years ago

All,

I find the rationale for this RM rather weak; basically the argument is being made that access to state components is sometimes needed (in pre- and postconditions) but that breaks information hiding. This is exactly why it is a bad idea in the first place: the designer on purpose does not want to provide access to the state, in order to be able to change it at some stage without outsiders to rely on the internal structure. The solution is to add getter-operations to provide that access.

With respect to "pure"-operations; two answers imho:

I am reluctant to support this RM if a better rationale (with concrete examples why it is required) is not provided.

Regards, Marcel

ldcouto commented 9 years ago

Hi Marcel,

The main rationale behind this RM is not operation purity in itself. It is to allow some form of operation to be called inside functions. We should not go back to allowing all operations so this seems like a decent compromise.

Unless there is already a way to wirte pre/post conditions that talk about object parameters. If so, that would be great but I don't know it. Do you know of one?

The suggestions you made all have some kind of problem, I think:

One other advantage of using an aproach like pure operation is that the designer of the class would implement the operations so they can control exactly how much, if any, of the state is revealed. They can, for instance, implement a boolean operation that does the check and fully hides the state (imagine a stack class with a canPush() operation instead of a getter for the stack size.).

Do you see any value in being able to reason about objects in pre/post conditions? Perhaps that's a better place to start.

JohnFitzgerald commented 9 years ago

Hi Luis I’m getting these messages, so assume I’m allowed to comment.

I’d agree that it’s better to start from a notion of what one wishes to do and the desired semantics, rather than focussing at this stage on a specific mechanism.

When you ask about a need to “reason” about objects in pre/post conditions, I assume you really mean a need to assert predicates over objects in terms of their declared interfaces (i.e. their published operations). The canPush example sounds like precondition quotation (i.e. allowing a call to the Boolean function pre-Push taking a whole stack object as a parameter).

I don’t think “pure” is the right word to use, by the way. There’s nothing implicitly “impure” about a classical operation that can change the state. So, if we come round to a special kind of operation to allow sophisticated assertions over objects, it ought to be called something else. Let’s shelve for the moment what it ought to be called.

John

tomooda commented 9 years ago

Luis, Besides "const" that I already mentioned, how about "rd" if you want less impact on the language spec, for example, public rd canPush : () ==> bool Because rd is already a reserved word, it wouldn't ruin any currently-valid specification. One possible drawback is that multiple meanings over "rd" might be confusing.

ldcouto commented 9 years ago

Hi John,

I'm not in the LB not but I think you and everyone are welcome here.

So, the idea behind these operations, whatever we call them, is to allow us to write predicates over objects. Imagine, something like the model below (again using a stack).

class Stack

values
max = 5;

instance variables
s : seq of int :=[];
inv len s <= max;

operations
public CanPush : () ==> bool
CanPush () == return len s < max;

public Push : int ==> ()
Push (i) == s := [i]^s;

public CanPop : () ==> bool
CanPop () == return s <> [];

public Pop : () ==> int
Pop() == let h = hd s
        in (s := tl s;
             return h);

end Stack

class UseStack
instance variables 
myStack : Stack := new Stack();

operations
public Push1 : int ==> ()
Push1 (i) == myStack.Push((i))
pre myStack.CanPush();

public Push2 : int * Stack ==> Stack
Push2 (i,s) == (s.Push(i); return s)
pre s.CanPush();

end UseStack

This kind of model is legal in classic but not in VDM10. To me, this kind of functionality is quite useful. And since the operations do not affect the state (that is the crucial point) then they should be allowed in some form. Maybe this notion of a "const" or read-only operation that Tomohiro suggested is more adequate than pure.

The kinds of things I think are useful to be able to do with this are:

nlmave commented 9 years ago

Luis, in order to avoid further confusion, please provide a full example that explains the context of your RM. Regards Marcel

----- Reply message ----- Van: "Luis Diogo Couto" notifications@github.com Aan: "overturetool/language" language@noreply.github.com CC: "Marcel Verhoef" Marcel.Verhoef@xs4all.nl Onderwerp: [language] "Pure" operations called in functions (#27) Datum: za, aug. 9, 2014 14:30 Hi Marcel,

The main rationale behind this RM is not operation purity in itself. It is to

allow some form of operation to be called inside functions. We should not go

back to allowing all operations so this seems like a decent compromise.

Unless there is already a way to wirte pre/post conditions that talk

about object parameters. If so, that would be great but I don't know it.

Do you know of one?

The suggestions you made all have some kind of problem, I think:

getter operations can't be called in pre/post conditions ext clauses don't prevent other operations from being called exporting state explicitly break encapsulaton One other advantage of using an aproach like pure operation is that the

designer of the class would implement the operations so they can control

exactly how much, if any, of the state is revealed. They can, for instance,

implement a boolean operation that does the check and fully hides

the state (imagine a stack class with a canPush() operation instead of

a getter for the stack size.).

Do you see any value in being able to reason about objects in pre/post

conditions? Perhaps that's a better place to start.

— Reply to this email directly or view it on GitHub.

pglvdm commented 9 years ago

Luis is on vacation at the moment so I will respond instead. The Stack+UseStack example provided above IS a full example. It illustrate the key point we are trying to make: In the current version of VDM-10 (VDM++) you cannot protect the proof obligations you will get because of potential mistakes in your pre-conditions BECAUSE you are no longer allowed to call operations! Not even ones that simply inspect the state of objects to determine if it is ok to call the operation. IMHO this is definitely something that we need to change because essentially this means that if you use an OO style of defining anything you cannot write the necessary pre and post-conditions, and if we cannot do that I don't see the point!

ldcouto commented 9 years ago

Yes, I was on vacation. Thanks Peter.

As Peter said, this issue predominantly affects POs. I'm open to the notion that there may be a better way to achieve this than read-only operations but do we all agree that the issue needs to be addressed?

(I'm back now.)

nlmave commented 9 years ago

Sorry for not replying sooner; it seems that progress on this topic stalled as I was unavailable at the last scheduled LB meeting. I had a short chat with PGL the other day and I now grasp the rationale of the submitted RM and I agree we need to find a solution for this as progress on POs is hampered. It seems that we "lost" functionality moving to VDM-10. In other words: I propose to accept this RM, assign a LB member to guide this RM and move to the Discussion stage (as described in http://wiki.overturetool.org/index.php/Community_Process). For what I am concerned, we do not have to wait for the next LB meeting in order to take this step. KP, as our LB-chair, can you please organize this? Thanks! Marcel

kgpierce commented 9 years ago

The LB have decided to "accept subject to revision" and have asked the originator to revise the RM to make a clear proposal for a language/semantics change, taking into account the discussions above.

nickbattle commented 9 years ago

It may help to compare the revised RM with what JML does with its "pure" methods. See the links below:

http://www.eecs.ucf.edu/~leavens/JML/jmlrefman/jmlrefman_7.html#SEC60 http://www.eecs.ucf.edu/~leavens/JML/jmlrefman/jmlrefman_6.html#SEC39 http://www.eecs.ucf.edu/~leavens/JML/jmlrefman/jmlrefman_41.html#cp_P (see "pure" entries)

In particular, note that entire classes can be declared pure (equivalent to their members being pure), that there are special rules for pure constructors and that methods overriding pure methods must be pure (though overloads are ok?). The JML definition of pure also includes guaranteed termination and (not) throwing exceptions.

[Edit: JML assertions (pre/post/inv) can only call pure methods!]

ldcouto commented 9 years ago

Hey guys. Here is the revised RM. Thanks to @peterwvj who helped me revise it. Feel free to post any questions. I'll be on vacation next week but I'll check my GitHub email every now and then.

Revised RM

1 Identification of the Originator

Luis Diogo Couto

2 Target of the request: defining the affected components of the language

Introduce pure operations and allow them to be called inside functions. Pure operations must not affect the state of any object.

3 Motivation for the request

When using an object in a pre/post-condition or invariant, it is often necessary to query its internal state. The only way to do this at the moment is to make the relevant instance variables public, which breaks encapsulation. Previously, in VDM classic, it was possible to call any operation inside a function so this was not an issue.

4 Description of the request, including:

(a) description of the modification:

Allow pure operations to be called inside functions.

A pure operation is defined thus:

A few issues need further discussion.

First, the matter of termination. I am not sure that we currently have enough support to prove operation termination, specifically with things like measures for recursive operations and variants for loops. This can be addressed to an extend by restricting the kinds of statements that are allowed in pure operations. For example, we could ban the while loop and operation calls entirely.

Second is the issue of operations being explicitly pure . It is very important that pure operations be statically identifiable as such. That means that pure operations should be somehow marked as pure and it must not be possible to change purity via inheritance. For example public op() == skip is not pure while public pure op() === skip is.

On the matter of syntax, it was suggested that I propose a version with no new syntax. I am personally not in favor of this but I will do it and then try to explain why I feel it's bad.

I see two ways to do this without adding new syntax. The first one is to change the semantics of the rd frame. Any operations that contain ext rd and have no write access to any variable are now considered pure. The obvious issue here is that this may break existing models. There is also another issue that I will discuss a bit further below.

Another possibility to add pure operations is by combining the rd frame with the self keyword. So, an operation with ext rd self would be considered pure. Technically, this adds new syntax though it does not add new keywords. This approach also shares an issue with the previous one: the description of ext rd becomes less clean. If it's used on its own it means purity. If it's used together with ext wr it means read access.

I would rather we have a total and clean distinction between normal and pure operations and adding a pure keyword is the easiest way to do it. It also simplifies the language manual. We simply describe the new keyword in its own section.

Finally, though JML suggests pure constructors and entire pure classes, I confess I do not immediately see an advantage to them. So I personally won't suggest adding that but it can be discussed.

(b) benefits from the modification:

When using an object as a function parameter it would be possible to specify behavior that depends on the internal state of the object without exposing said internal state.

This modification would be particularly helpful if we want to use objects in pre/post conditions and invariants, which are signature features of VDM. Pure operations would be particularly suited to implement the checks that these conditions rely on.

As an example, consider the following VDM model which would be legal in VDM classic but is not legal in VDM 10.

class Stack

values
max = 5;

instance variables
s : seq of int :=[];
inv len s <= max;

operations
public CanPush : () ==> bool
CanPush () == return len s < max;

public Push : int ==> ()
Push (i) == s := [i]^s;

public CanPop : () ==> bool
CanPop () == return s <> [];

public Pop : () ==> int
Pop() == let h = hd s
        in (s := tl s;
             return h);

end Stack

class UseStack
instance variables 
myStack : Stack := new Stack();

operations
public Push1 : int ==> ()
Push1 (i) == myStack.Push((i))
pre myStack.CanPush();

public Push2 : int * Stack ==> Stack
Push2 (i,s) == (s.Push(i); return s)
pre s.CanPush();

end UseStack

(c) possible side effects:

If the new keyword is introduced, current models that use it as a name will no longer be correct. It's worth noting that none of the Overture examples use the word pure as a name.

If the semantics of ext rd are altered, models that use it may no longer be correct. 11 examples use ext rd, though I have not checked the bodies of those operations.

5 Source code and technical documentation where applicable

Documentation: new keyword or updated semantics of ext rd need to be explained in the language manual.

Source code: the new keyword would require changes to the parser. Both versions would require changes to the AST and type checker. Minor changes may be needed in other plug-ins.

6 A test suite for validation of the modification in the executable models.

N/A for now but I would be happy to create one. We need to check for:

nickbattle commented 9 years ago

A few thoughts, in no particular order: We want pure operations to be callable from any function context, not just from pre/post/inv constructs? You say that operations are executed atomically. Does that mean no thread swaps and no time movement? What happens if an operation itself call blocks on a sync guard? Or does purity imply a lack of guards? Functions are implicitly static - they have no self - so any operation call must either be static or be on an object reference that the function discovers, either from a class' static state, or being passed in? Functions are no longer referentially transparent: they cannot be replaced with the value of their body and result in no change to the specification semantics. Is this OK? You didn't mention exception throwing in the definition of purity. Is that deliberate? Do operations called atomically in functions affect history counters? I'm not sure that this counts as a "side effect" in the normal sense, but calling an operation can release the sync guard on another call in another thread, which may be unexpected if counters are used to balance get/set calls. Are constructors always impure? I agree that it would be simplest to explicitly declare operations to be pure. And I would prefer a new keyword - overloading the existing ones seems confusing. I'm sure I'll think of a few more questions in the next few days :-)

ldcouto commented 9 years ago

Here are some immediate answers to the easier questions:

We want pure operations to be callable from any function context, not just from pre/post/inv constructs?

Yes, totally. I think pre/post will be the main use but don't seem any reason to limit it.

You say that operations are executed atomically. Does that mean no thread swaps and no time movement? What happens if an operation itself call blocks on a sync guard? Or does purity imply a lack of guards?

I'm not too experienced with threads but in general I would say no swaps and no time. No blocks and therefore no guards. Unless a function can lead to any of these situations, pure operations should not lead to it either.

Functions are implicitly static - they have no self - so any operation call must either be static or be on an object reference that the function discovers, either from a class' static state, or being passed in?

I think so. If you pass an object as a parameter to a function, you can call pure operations on that object. And of course, static pure operations can be called from anywhere that they are visible.

Functions are no longer referentially transparent: they cannot be replaced with the value of their body and result in no change to the specification semantics. Is this OK?

Are we certain of this? What would change by replacing a function with its value?

You didn't mention exception throwing in the definition of purity. Is that deliberate?

No, that's an oversight. Pure operations should not throw exceptions. And should also probably not contain error statements. Shall I update the RM?

Do operations called atomically in functions affect history counters? I'm not sure that this counts as a "side effect" in the normal sense, but calling an operation can release the sync guard on another call in another thread, which may be unexpected if counters are used to balance get/set calls.

I'm not sure here. I would maybe suggest that pure operations not affect history counters.

Are constructors always impure?

Well, a constructor sort of changes the state of an object (it creates the state). So I would say yeah.

tomooda commented 9 years ago
Functions are no longer referentially transparent: they cannot be replaced with the value of their body and result in no change to the specification semantics. Is this OK?
Are we certain of this? What would change by replacing a function with its value?

Because a "pure" operation may return different values depending on states, a function that calls a "pure" operation may evaluate to different values depending on return values of the "pure" operation, and therefore the function is not referentially transparent. I think allowing functions to call "pure" operations is almost like allowing functions to directly read instance variables.

ldcouto commented 9 years ago

Ok, this may be a very silly but any function would return different results for different parameters.

So, if the object we are calling the operations on is a different parameter, it should yield different results. This isn't the case for static operations so maybe those should be banned?

On 17 October 2014 22:45, Tomohiro Oda notifications@github.com wrote:

Functions are no longer referentially transparent: they cannot be replaced with the value of their body and result in no change to the specification semantics. Is this OK?

Are we certain of this? What would change by replacing a function with its value?

Because a "pure" operation may return different values depending on states, a function that calls a "pure" operation may evaluate to different values depending on return values of the "pure" operation, and therefore the function is not referentially transparent. I think allowing functions to call "pure" operations is almost like allowing functions to directly read instance variables.

— Reply to this email directly or view it on GitHub https://github.com/overturetool/language/issues/27#issuecomment-59573502 .

tomooda commented 9 years ago

I don't think we need referential transparency in assertions, but "side-effect free" is enough. I do appreciate referential transparency of functions because it simplifies data dependency. What are benefits of allowing functions to call pure operations?

Assuming that a function on an object A calls a pure operation on the object A that calls another pure operation on another object B, the object B wouldn't appear in the function application expression. I'm afraid allowing functions to call pure operations would introduce "under the hood" data dependency to functions.

nickbattle commented 9 years ago

OK, I think I agree with most of your answers, Luis. A few more points:

I think Tomo is right about referential transparency. In essence, what we have here is a way for functions to access instance variables, albeit via a GetValue() call. This means that if you try to replace "function(args)" with its return value, you cannot know which value to use, indeed the value of function(args) may be different at different times. I think this completely breaks much of the VDM proof theory that is based on functions (because they're not really functions any more!)

One slightly disturbing consequence is that the "functions" that form type invariants may no longer be invariant in the sense that they no longer necessarily give the same membership answer for the same value! We wanted this behaviour for pre/post, so that we could check state via operations. But for type invariants, it seems to imply that all invariants which call an operation would need to be re-evaluated if the state on which they depend changes.

The problem with history counters is that they are a "side effect" in the sense of affecting the specification's behaviour. At the moment, it is possible to animate a specification with invariants and pre/postconditions disabled, and the behaviour is exactly the same. This is as it should be, I think. These checks are "invisible", and ought not to affect the specification's behaviour. We could say that operations called from functions (ie. pure ops) would not affect history counters. That might get round the problem, but it's worrying/ugly to have exceptional cases like this.

Another question: do you think pure operations should be back-ported into VDM-SL, or is this just a VDM++ matter? On the one hand it could be "basically the same", and consistency is good. On the other hand, VDM-SL does not have the same problems because state is always local to a module, and you can just refer to it in operation pre/posts. What it comes down to is that the functions for SL pre/post have a well defined way to be passed state records (and therefore remain proper functions), whereas VDM++ does not. Unless you can convince me that it solves a problem in VDM-SL, I would be tempted to leave it out and keep SL "clean".

pglvdm commented 9 years ago

If we agree that pure operations are executed atomically I would like to suggest that one cannot define synchronisation constraints for ii. This has both be advantageous since then it does not need to have its history counters updated (and thus we will have no "side effect") and when they can have no guards they cannot be blocked. I fully agree with Nick that there is no point in allowing the operation called inside functions in VDM-SL. The reason for this RM is IMHO purely caused by the OO extension and the need to be able to look into an object's instance variables when it is used as an argument to a function (in order to be able to define the appropriate pre-condition). Thus I also vote for keeping VDM-SL "clean".

nickbattle commented 9 years ago

I think it might help us think about this area if we remind ourselves that VDM-SL already defines functions for pre_op and post_op. These function bodies can apparently access state data - which real functions cannot. They can even access the old and new state values.

This is done by the entire module state (two copies, for post_op) being passed as record values (ie. copies) to these functions. The tool then implements a thin later at the start of the function which takes the arguments passed, and expands them into local variables so that the function can read their values by name - in effect a large "let" expression prefixes the body of the function.

So accessing state from functions (apparently) is not new. What is new in VDM++ is that there is no definition of the function signatures for pre_op and post_op - we do not know how to pass the old and new state of the system. This is because in VDM++ (unlike a VDM-SL module) the system state comprises an arbitrary graph of objects reachable from "self" and the public static members of all other classes. This is tricky to represent as a function parameter!

Internally, VDMJ and Overture has a kludge. The old/new state arguments are passed as a map of names to values, and at runtime, the names to pass are determined from a static analysis of the body of the function. That's ugly, but it does work for simple variable access. But it does not work for the case we are considering here, where we want to access private members of another object.

I suppose my fundamental reservation about this RM is that it is not trying to solve the problem of "how do we pass state to pre/post_op so that we can reason about it", but rather "how can we get all functions to access state sneakily, so that we don't have to define pre_op/post_op".

I know it's hard (I can see a simple way anyway!), but if we could solve the pre/post_op state parameter passing problem in VDM++, we could avoid a whole lot of (frankly) worrying changes to function semantics that this RM brings.

Remember, VDM-SL already has to pull some tricks to turn a parameter of mk_Sigma into a set of named locals so that the body can access them. They're not accessed directly. Can we not think of some similar tricks such that the function body can write obj.getValue() and obtain the right answer, even though it is not literally calling an operation on the real object, but a copy (maybe not even an object, but a ghost of the same shape, like a pseudo-record that contains the instance variables or something)?

tomooda commented 9 years ago

Nick, your post reminded me of lambda lifting (http://en.wikipedia.org/wiki/Lambda_lifting) that "lifts" local bindings to parameters in functional languages. I don't know if we can do similar in VDM++.

tomooda commented 9 years ago

Regarding to signatures of pre_op/post_op, One naive idea is a special "graph" type that we can't write a literal nor a printable string but other features act like record types. The node type is not a compound type. It's a basic type like the token type. Equality between nodes is defined by conjunction of equalities between field values. A node value can be created by mk_node(). More complicated graphs (including cyclic ones) can be composed using mu expressions. A field in a node can point at another (including itself) node or any value. Nodes in a graph can be traced through field accessors. One special operation "clone" that generates a deep copy of graph from an object, including private members and self (self is kept as an object reference).

One big defect of this approach is violation against the encapsulation principle. The use of the "clone" operation needs a special care. May be better dedicated to tools. I don't think interpreters have to perform the "clone" operation at every pre/post checking. It's an interface for users to manually evaluate the pre/post conditions. I'm not sure that this is practically feasible for tools with actual specifications. I'm also afraid this brings "dynamic typing" nature to VDM.

pglvdm commented 9 years ago

I think that the situation is different than for VDM-SL. Here we only have one state that is treated as a record in case you need access to the (only state). In a VDM++ setting we need access to the state of objects provided as arguments to the function. If we should do something like what Nick suggests we would essentially need to go for the deep clone semantics (expensive from an execution perspective) and then consider the "state" of all the objects provided as arguments as "call by value" records where we give up the information hiding principles and allow direct access to all fields (i.e. instance variables) independent of what access modifiers have been used. I don't think that this would be a sensible route.

nickbattle commented 9 years ago

The big difference with VDM++ is certainly that we're "passing state" (upon which the pre/postcondition depends), but it doesn't seem sensible to throw away referential transparency as a means to solve it. The RM so far has discussed use of pure operations in functions generally (not just pre/post/inv), but I was trying to point out that we already have a function state access solution for VDM-SL in just the special case of pre/post (which is where we were proposing pure operations would be used).

Perhaps the fundamental reason it doesn't work in VDM++ is that we don't have object values (which sound rather like the nodes that Tomo explained). The object state passed to the operation could also be passed to the post_op function as a parameter - but object references are not (currently) value parameters, so that doesn't help.

Incidentally, solving the post_op parameter problem would also enable us to refer to something like obj~.getValue() - the previous value of the object's state. I don't think pure functions help us with that.

I agree that deep copies are a big problem for tools. But if they solve the problem that we have with the language, we can define that this is the way the pre/post_op functions work, but say that the tools are limited in their ability to run simulations. We have other places where this is the case, like the "init" clause of SL state records, only some forms of which are executable.

nickbattle commented 9 years ago

Another thought, though I'm not sure whether this helps.

In the case where we are trying to check whether it is OK to push onto the stack object, we are trying to propagate the precondition of the push operation out to our own "second level" push (that perhaps does other things). So why can't that 2nd level push not call the function obj.pre_Push() in its precondition? (I know Push doesn't have a pre clause in the example, but it could/should).

I know that's a circular argument in a way (we end up having to solve the problem of passing state to pre_Push), but this is the means to do precondition propagation in VDM, isn't it?

tomooda commented 9 years ago

I think it's almost free to derive a pure operation obj.pre_Push() from precondition of obj.Push(). I think the idea of pure operation agrees with the dynamic nature of OO. I'm just wondering what would be essential difference between pure operations and functions if we allow functions to call pure operations.

nickbattle commented 9 years ago

The essential difference would be that pure operations can read state, surely? But I agree they are similar.

I can't remember what VDMJ/Overture implements for VDM++, but I meant that the obj.prePush function is already created by the language (you get pre and post_ functions defined for anything with pre/postconditions). This is certainly the case in VDM-SL. In VDM++ we're not sure of the functions' signature, but I think VDMJ does create functions which are callable. I'll check when I get a minute.

Edit: Yes, VDMJ does create the pre and post functions. The pre function has a "self" object reference for the state, the post function has a "map (seq of char) to ?" as the old state and an object reference as the new state. The map is populated with the old values of every "old" name that the body of the function refers to. This is the ugly solution I was referring to earlier.

But you can certainly call another class' pre_Push function in the precondition of one of your own operations.

nickbattle commented 9 years ago

I've just been discussing these problems with PGL.

We would suggest that since pure operations are primarily intended to enable checks to be made from pre- and postconditions, and since there are some problems with breaking referential transparency elsewhere (at least in the case of type invariants, possibly others), we should limit the calling of pure operations to the bodies of pre/postconditions - as well as including the other constraints, being atomic calls that do not block or influence history counters.

That will let us make progress, we think. It's not ideal, but it's very hard to solve in the current "state passing" way - and besides, that would still require us to break OO encapsulation somehow, to access private state data.

kgpierce commented 9 years ago

I think this latest suggestion to restrict to pre-/post-conditions is a good one. It means we are addressing the smallest part to overcome the current difficulty.

Perhaps I have misunderstood some of the state cloning discussion, but would it be possible to simply allow pre-post conditions to refer to instance variables in other objects directly (as if they were public but read-only). The interpreter could take a snapshot of the values in a record when the pre-post conditions are evaluated similar to the VDM-SL style that Nick mentioned further up. I assume that if the pre-post condition referenced an object instance variable, the interpreter could drill-down to eventually come up with a non-object type.

nickbattle commented 9 years ago

Yes, something like that gets you a long way, Ken. But a pre/post expression might also want to refer to public static state of one or more classes, which are not directly reachable via "self". The state we want to pass to the pre_op and post_op functions is everything that they can reach via self and any public statics (and in the case of the post_op, we want the before and after values).

The kludge I came up with in VDMJ is to pass a map of names to values - the names being anything that the body of the function refers to. But to do that I have to cheat a little and delcare the parameter to be of type "map (seq of char) to ?", since VDMJ supports the "?" type (anything). And this still only gets you "first level" access: if the function refers to obj.iv, then the map contains "obj" |-> an object reference, which isn't a copy. To do this properly, I think the names would have to be arbitrary field access names ("obj1.obj2.obj3...iv"), and those would have to be sucked out of the real state before calling the function.

Besides, even if we do this, we are breaking encapsulation. The original problem was that we want to be able to call operations that test hidden state. If you have to make the state public (or we arrange for special powers for pre/post to ignore access rules) then we are exposing the insides of objects, which "you're not supposed to do" :-)

kgpierce commented 9 years ago

So from a technical perspective I don't see a problem here (I might be mistaken of course). We adapt the existing kludge to OO VDM, building the map of names referred to in the function, with the TC extended to allow pre-post conditions to use arbitrary field access names and static members of classes (e.g. len myStack.s < Stack`max). Would there be an unacceptable overhead during simulation here, or only for very long pre-conditions accessing many very deep values (seems unlikely for most models).

From a "you're not supposed to break encapsulation" viewpoint, I'm more relaxed. We use pre-post conditions to make assertions and guarantees about what we think the model should or shouldn't be doing at any given time, allowing us to check our assumptions. So in this limited context, giving our 'assertion' language a more God-like power seems more justifiable than allowing (even clearly marked) functions to access state, since we need to check our assumptions against something that knows what's actually going on.

Since pre-conditions only guard, they never require something to happen, in this sense the encapsulation breaking doesn't seem dangerous. With checking turned off, this solution should ensure that the execution is the same as with checking turned on. Whether or not it makes sense to guarantee in a post-condition something about values owned by another object is a different question. At least this would challenge the modeller's assumptions if it was flagged as failed during simulation and is maybe an indication that your ego's writing cheques your model can't cash. I suppose a better suggestion would be to write myStack.post_Push() is true (if you're allowed to do this). At least then a failure might indicate a concurrency problem.

nickbattle commented 9 years ago

OK, to explore that a bit further, what should the pre_op and post_op function signatures look like? At the moment, they're like this in VDMJ:

class A
...
op: nat ==> char
op(a) == ...
pre a < 10
post RESULT in set elems "abcde";

-- Implicitly creates these definitions ...
pre_op: nat * A +> bool
pre_op(a, self) == ...

post_op: nat * char * (map seq of char to ?) * A +> bool
post_op(a, RESULT, oldself, self) == ...

Operations with no result type just have the parameter types in post_op. Static operations are missing the "A" parameter in both. Both functions are total. Note that VDMJ only uses the map for "old" values currently - it can access the current values easily from the "self" objref passed as the final parameter. But we may want to make both old and new parameters be the same map type. Note that the map type is not "legal" VDM as it uses a non-standard "?" type.

The "seq of char" has to be able to identify any state expression that the body can use, so we have "name", "{field.}name" and "C`name" forms, I think? (At the moment, it only allows "name"). The type checker would presumably still check these variables as though they were going to be accessed as a normal operation would, but without access constraints. The "old" names, like var~, would be type checked without the "~" (the type of the state hasn't changed). The grammar would have to change to allow "old" access to indirect instance variables - would you write "obj~.iv" or "obj.iv~" or "obj~.iv~"? Note also that we're using "seq of char" so that you can call the function yourself and identify (say) the variable "counter" without the runtime turning that into the value of counter - ie. there is no way to create a variable reference in VDM as it stands, so I'm using "text" names.

Incidentally, the grammar of ext rd and ext wr clauses would also ought to change to specify fields of objects that are read/written by the operation.

At runtime, the various pre/post Expressions accessing state have to be intercepted and directed to a lookup in the map or maps. This already happens, since these functions do not have a "self" and so access to state variables has to access the value from the arguments passed in. This magic is done by the PreOpExpression and PostOpExpression classes (in VDMJ). When these are executed, they create a runtime context containing the state values passed. The "old" values are also added to the runtime context as though they were local definitions.

You can call the pre_op and post_op functions directly, just like you can in VDM-SL, though currently they are declared as non-static even though they probably ought to be static like all functions (so you have to call obj.pre_op(...) not A`pre_op(...)). The functions have the same public/private access as the main operation.

So this very nearly works, but we would have to agree the signatures, and agree to allow "private field access" from pre/post bodies, and be comfortable with the encapsulation situation.

[Edited op definition]

kgpierce commented 9 years ago

Thanks, Nick! A good effort in turning my vague thoughts into something concrete to discuss :-) So this looks good to me ("very nearly works" as you put it)!

nickbattle commented 9 years ago

Yes, {field}+.name if you prefer - I was thinking in EBNF where the curlies imply repetition :-)

Agree it's clearest to have the "~" on the end of the expression.

I think it's just a bug that the functions aren't static. At some point, we made all function static, so it's possible that this is something that wasn't fixed with that change. These functions are very rarely used in VDM++, so no one noticed. The behaviour is the same, yes. It's just the call that's different.

Nice idea to warn if a pre/post is using a private variable (or protected etc - inaccessible).

I'd have to check, but at one point the VDMJ implementation was such that if you have an ext clause, then you could only read the "ext rd" names and only write the "ext wr" names. That works nicely in SL, and frames the operation clearly. But in VDM++, the grammar (which only allows "identifier" as the name of the external) will not allow such things as reading obj.iv. So I'm not sure what the tools enforce in VDM++ (if anything) with ext clauses. I don't think POs are generated to enforce this. This is all about what the operation body does, not the pre/post, which can read what they like.

nickbattle commented 9 years ago

From a few quick experiments, it looks like an "ext" clause will limit the environment visible to the operation body to just include the state items named in the clause. It does not distinguish between "rd" and "wr" however. Unfortunately, this restriction is also passed to the bodies of the pre/post functions, which I think is probably a bug(?) - just because the body is restricted to (say) only read "iv", that doesn't mean that the pre/post expressions cannot look at other state variables.

It's essentially the same in VDM-SL, except there is makes sense to be limited to a simple name in the "ext" clauses. In VDM++, the grammar won't permit "ext rd obj.iv".

I was testing with the spec below, if you want to try it:

class A
instance variables
    iv1 : nat := 0;
    iv2 : bool := false;

operations
    public op(a : nat) r: nat ==
        if iv2 then     -- "State variable iv2 cannot be accessed from this context"
        (
            iv1 := a;
            return iv1
        )
        else
            return 0
    ext wr iv1
    pre iv2 => iv1     -- "State variable iv2 cannot be accessed from this context"
    post r = iv1;

end A
nickbattle commented 9 years ago

Off topic, but I've just checked in a fix to "ext" type checking so that pre/post bodies are not constrained to use just the ext state in the same way that the body is, and if you attempt to make an assignment to an "ext rd" state variable, you get a type checking error.

The changes are pushed to ncb/development.

nickbattle commented 9 years ago

I've just noticed that VDMTools puts pre/post expressions under the same "ext" constraints as the body - ie. the way VDMJ was before I changed it today, though in the specification above, it complains about the use of iv2 in the "pre" expression, but not about its use in the body. So now I'm wondering whether I did the right thing! What do people think?

C:\Cygwin\home\OvertureWorkspace2\TestSL\TestSL.vdmsl, l. 19, c. 6:
  Error[37] : The state component "iv2" must not be used here
pglvdm commented 9 years ago

Well as far as I recall the ISO standard for VDM-SL this error should be reported if this had been an implicit operation definition (the extended form was not in the standard). Thus, if you wish to have read access to instance variables I would still think that it should be required to have a "rd iv2" clause in teh "ext" part.

nickbattle commented 9 years ago

OK, I've updated the fix to extend the restriction to the pre/post expressions as it was originally. It still complains about assignment to "ext rd" names, and there was a bug that allowed old~ ext names to be referred to in the precondition, which is now corrected.

This isn't really to do with the RM, so sorry for the noise...

kgpierce commented 9 years ago

Hehe, it's been a while since I did EBNF, sorry for the unnecessary correction.

So the main issue left still seems to be externals clause? It's slightly odd because a pre-condition might involve a state variable that isn't used in an operation body and wouldn't be read in any final implementation, yet be required to be marked as rd. Anyway I suppose we should follow the ISO standard's example.

nickbattle commented 9 years ago

Well, I think the externals business is a bit of a distraction. I'd rather clear up the signature of the functions and decide how to define the state map - since it can't be done in "regular" VDM, as far as I can see. Yes, the externals ought to match the ability to refer to state, but that might be little more than changing the grammar.