Request for Modification: allow sequences in bindings

paulch42 commented 8 years ago

Identification of the Originator

Paul Chisholm

Target of the request, defining the affected components of the language definition

Affects type 'seq’ and its primitive functions.

Motivation for the request

Comprehensions allow bindings over sets or types. It is a common situation to want to create a comprehension that ranges over a sequence. While one can define functions in VDM-SL to achieve the same purpose, being able to specify comprehensions that directly range over sequences would result in specifications that are more immediate, clearer and concise.

Description of the request

Add primitive sequence functions and allow their use in bindings such that comprehensions can range over sequences directly.

description of the modification

Add the primitive functions ‘in seq’ and ’not in seq’ to type ‘seq’. Extend the definition of binding to include ‘in seq’.

benefits from the modification

Allows comprehensions to range over sequences directly which allows more immediate specification of functions over sequences. ‘in seq’ and ‘not in seq’ are available as primitive functions; no need for individual specifications to define them. At present one can use a combination of sequence indices and set binding. For example, the following example for the Fitzgerald/Larsen “Modelling Systems” book (section 7.4)

[ ms(i) | i in set inds ms & Classify(ms(i),cat) = <HI> ]

becomes the clearer and more immediate

[ e | e in seq ms & Classify(e,cat) = <HI> ]

possible side effects

None identified.

A test suite for validation of the modification in the executable models (if appropriate).

None

peterwvj commented 8 years ago

Some initial thoughts:

I think you can achieve what you want using the elems operator, which converts a sequence into a set. Did you consider that? For example:

[x | x in set elems [1,2,3]] = [1,2,3]

Also, what would the iteration order be for an expression such as e in seq ms?

tomooda commented 8 years ago

Interesting idea. I agree that such map and filter to a sequence is a little tedious in VDM. Paul, have you considered introducing "map" and "filter" operators?

peterwvj commented 8 years ago

The LB has decided to advance the RM. Luis Diogo Couto has been appointed the owner.

nickbattle commented 8 years ago

A couple of initial thoughts.

As PJ says, we do have to think about the definition of the ordering if "in seq" is used in a list comprehension. [x | x in set S] gives a sorted sequence on the elements of S, so does [x | x in seq S] do the same, or does it preserve the order in S, or something else?

Presumably "in seq S" is loose, in the sense that "let e in seq S be st p(e)" does not define the element chosen? That implies that we should also have an "in seq" traces expression - most loose expressions have traces equivalents that expand the looseness to all the possibilities.

paulch42 commented 8 years ago

The list already appears in a specific order (possibly with duplicates). I think comprehension should respect the order and multiplicity. That is, I believe the semantics of [ f(x) | x in seq s & P(x) ] should be the same as [ f(s(i)) | i in set inds s & P(s(i)) ] @peterwvj @nickbattle Preserving the order of the source sequence allows the feature to be used more widely. If we are constrained to sorting the sequence then it has to be a numeric sequence (as is the case when sequence comprehension is used with a set as the source). Consider the example from the Modelling Systems book referenced in the proposal. The 'in seq' version of that is not valid unless we allow comprehension over sequences of any type. @tomooda I did consider 'map' and 'filter'. I think your point is that sequence comprehension with 'in seq' as a source is a combination of 'map' and 'filter'. I find the comprehension version is clearer and more immediate than the functional version, so is preferable from the perspective of communicating a specification.

tomooda commented 8 years ago

Paul, OK, I got it. I have another couple of questions.

Do you think seq-bind would be useful in places other than seq comprehension where set-bind can be used, such as set/map comprehension, quantified expressions and let-be expressions/statements? I think this is related to Nick's question.
The sequence for loop statement takes the form of sequence for loop = ‘for’, pattern bind, ‘in’, expression, ‘do’, statement ; I just wonder which looks better, pattern 'in seq' expression or pattern bind 'in' expression. The former takes the form in your proposal, and the later would look like [ f(x) | x : t in xs & p(x) ] or simply [ f(x) | x in xs & p(x) ]. Any thoughts?

paulch42 commented 8 years ago

Tomohiro,

yes, I think it would be useful in other cases where a binding occurs. The proposal was motivated by examples I looked at using list comprehension, but I can envisage many instances where we would want to universally and existentially quantify over sequences. No doubt that already occurs with patterns such as "forall x in set elems s & P(x)" which becomes the simpler "forall x in seq s & P(x)" (though with multiplicity of sequences, using the 'exists1' quantifier the 'in seq' version is not semantically equivalent to the 'in set elems' version, but to me that is as it should be to respect the fact that multiplicity is significant in sequences). The proposal is small in scale, and for an individual expression is not that different, but I think globally such small improvements that are employed extensively provide a significant benefit.
I have only used the functional subset of VDM-SL to date so this had not occurred to me. I will need to look at statements before I comment on this.

nickbattle commented 8 years ago

Using the ordering of the "in seq" sequence does make sense, and frees us from the need to bind to a single numeric value, though I think we are still limited to binding a single value (or a pattern that matches a single sequence value). For example, we cannot have "a,b,c in seq S" because if all the variables bound to all possible sequence value combinations (as they would with sets) then we lose the definition of the ordering. But we could have "mk_(a,b) in seq S", assuming the elements were pairs. That just affects the syntax/parser (pattern bind rather than multi pattern bind, or something like that).

I assume we would also say, as with sets, that bindings which don't match sequence values (eg. wrong type) are ignored (though currently, if you don't match a numeric value it is an error, so that can be relaxed as we no longer need numerics).

ldcouto commented 8 years ago

I agree that non-matching values should be "skipped".

In general, I think sequence comprehensions would be a nice addition to the language. I have struggled with precisely these issues so often that I wrote generic map and filter functions. But they look really ugly...

I'm less certain on the need of in seq and not in seq operators. You can get the same behaviour with in set elems.

Regarding the for loop statement, I don't think it should be changed as part of this RM. It's a completely separate thing.

nickbattle commented 8 years ago

Thinking about it, the restriction on a single pattern bind with "in seq" would only apply if it was used in a sequence comprehension (because of ordering). In a set comprehension, it would make sense to allow multiple binds with "in seq" just like "in set". Perhaps that's obvious, but when I made that earlier comment I was thinking it was because of the way "in seq" works, but it's really about the sequence comprehension's need for a well defined order.

Perhaps someone could sketch the revised grammar for Chapter 8 of the LRM (Bindings)?

tomooda commented 8 years ago

bind = seq set bind | type bind ;
seq set bind = seq bind | set bind ;
seq bind = pattern 'in seq' expression ;
multiple bind = multiple seq bind | multiple seq bind | multiple type bind ;
multiple seq bind = pattern list 'in seq' expression ;
sequence comprehension = ‘[’, expression, ‘|’, seq set bind, [ ‘&’, expression ], ‘]’

This will affect the following constructs.

trace let best binding
let be expression
def expression
forall expression
exists expression
exists1 expression
iota expression
set comprehension
sequence comprehension
map comprehension
let be statement
def statement
sequence for loop
trap statement
recursive trap statement

nickbattle commented 8 years ago

Luis, as owner of this, would you like to summarise what has been discussed so that we can quickly decide what to do at the meeting on Sunday?

ldcouto commented 8 years ago

Summarisation!

Alternatives:

Use elems and inds operators (this is the do-nothing approach)
Introduce map and filter operators instead, possibly as higher-order functions

Open questions:

Are in seq and not in seq really needed as operators?
How should for loop statements look with the new binding?
Any relevant POs or TCs?

Settled questions:

bind ordering follows the ordering of the original sequence
bind is restricted to a single binding (no multi-binds) but not necessarily of a numeric value (as it is now)

Proposed grammar revision and affected constructs(h/t @tomooda ):

bind = seq set bind | type bind ;
seq set bind = seq bind | set bind ;
seq bind = pattern 'in seq' expression ;
multiple bind = multiple seq bind | multiple seq bind | multiple type bind ;
multiple seq bind = pattern list 'in seq' expression ;
sequence comprehension = ‘[’, expression, ‘|’, seq set bind, [ ‘&’, expression ], ‘]’

trace
let be st binding
let be expression
def expression
forall expression
exists expression
exists1 expression
iota expression
set comprehension
sequence comprehension
map comprehension
let be statement
def statement
sequence for loop
trap statement
recursive trap statement

nickbattle commented 8 years ago

Thanks Luis. And I think there is also the point about the ordering, when used in a sequence comprehension, following the order of the original sequence, and that being restricted to a single binding (not a multi-bind) but not necessarily of a single numeric value as it is currently.

ldcouto commented 8 years ago

I thought that had been decided already. But I will add it.

peterwvj commented 8 years ago

The LB has decided to proceed with the RM, which enters the discussion phase.

peterwvj commented 8 years ago

Based on today's meeting people generally seem to be supportive of the in seq in sequence comprehension but less in favor of the in seq operator.

pglvdm commented 8 years ago

I agree that using "in seq" inside a sequence comprehension would make it clearer so I am in favour of that proposal. However, I am against using this generally and I certainly don't see a pressing need for having it (and its negation) as binary operators.

peterwvj commented 8 years ago

The LB has decided to move this RM to 'Execution'.

ldcouto commented 8 years ago

In preparation for execution, the status of open questions is:

Are in seq and not in seq really needed as operators?

No. These will not be implemented.

How should for loop statements look with the new binding?

Loop statements will not be affected by this RM. They remain as before.

Any relevant POs or TCs?

Nick will investigate this through the VDMJ implementation and consult with the LB as things come up

nickbattle commented 8 years ago

There is a very early draft of this available for VDMJ. The syntax and type checking is reasonable. Simple sequence comprehensions look OK, but I haven't tried anything further as yet.

https://github.com/nickbattle/vdmj/releases/download/3.1.1-1/vdmj-3.1.1-RM36.jar

nickbattle commented 8 years ago

I think the semantics of "in seq" bindings are clear in all cases except the "awkward squad" - the last four cases listed above in Luis' summary:

First def statements. These can include set binds, but you end up with "def e in set S = n in...". This is hard enough to parse, let alone understand. It may just be that the grammar uses a patternBind because it makes sense for patterns and or type binds. But set binds (and now, sequence binds) don't seem to make sense. Not sure what to do here.

You get similar issues with the sequence for loop, and trap and tixe statements. They are defined with patternBinds and make sense with patterns and type binds, but set binds (and now seq binds) make less sense.

Does anyone have a view? We can perhaps discuss further on Sunday.

ldcouto commented 8 years ago

As far as I understood, sequence for loops are to remain unchanged.

As for the others, I'm not sure. If we're pragmatic, seq binds really make a difference for enumerations. But I would like for the language to be consistent. Is there anywhere that set binds are not allowed but other binds are?

tomooda commented 8 years ago

value definitions optionally allow patterns to be typed, but their syntax is defined as pattern, [ ':', type ] instead of pattern | type bind. I rather wonder why it's not pattern bind.

To me, it looks natural to allow set bind in def statements and any other definition-related constructs because sets are quite close concept with types.

nickbattle commented 8 years ago

Def statements with set binds always look very strange to me, like "def e in set {1,2,3} = 2 in ...". Internally, VDMJ uses this to decide that e is of type nat1, and when the definition is executed, it is bound to the value 2, after a check that 2 is a member of the set (which can be an expression, so it may change at runtime). The presence of the set bind dynamically limits the assignable value, so the definition "e in set S = v" is saying that v is guaranteed to be in S, as well as binding it to e.

Perhaps that has some specification value, but "def e:T = v in..." makes more sense to me, as does "def e = v in..." (type binds and patterns, respectively). I'm not sure whether the choice of "pattern bind" in the grammar is deliberate or not.

So I could extend this, and allow "def e in seq S = v in...", which would require v to be in S. Just checking back, because this is a awkward case internally as well :)

Sequence for loops were mentioned in your list, Luis. You can write "for e in set S in sequence do...". As above, what this does is limit the values from the sequence to be from the set - with a runtime error if they are not there.

Same story with trap statements: "trap s in set S with stmt in body" will cause a runtime error if an exception matching s does not exist in S. It is similar for tixe statements, except if the match to the set fails, other tixe alternatives are attempted (so no runtime error).

So should all of these get "sequence binding" extensions for RM36, or are they set-binds only?

nickbattle commented 8 years ago

I've drafted changes for these four last cases (defs, loops, traps and tixes). It wasn't too awkward, and the change has roughly the same semantics as "in set" - ie. if the value being matched to the pattern is not also in the sequence, we throw an error. I'll commit it later, if there are no objections.

I noticed one curious thing with tixes and "in set" binds, and what I said before isn't quite right. The tixe cases are tried in order as long as the exit code does not match the type of the pattern or match the pattern itself. But if there is a set binding, and if the value matches the pattern but is not in the set, this is a runtime error. I assumed it would try the other tixes (so you could divide up your processing by using the same pattern with different sets). But both VDMJ and VDMTools fail with a runtime error in this case.

nickbattle commented 8 years ago

We should presumably add POs for the case of failure to find the value in a set or sequence (in the cases that use pattern-bind, like "def s in S = v in..."). I think it's just "v in set S" or "v in set elems S", with the usual context around it.

nickbattle commented 8 years ago

I've created POs for the cases where there is a clear value in scope that must match the set/sequence in a bind. In the case of traps and tixes that's harder (the value is an exit code), but this isn't the only problem with statement POs! So the spec below now produces the POs shown:

functions
    f: nat -> nat
    f(n) ==
        def s in seq [1,2,3] = n in
            def x in set {4,5,6} = n in s+x
    pre n > 0 and n < 4;

operations
    op: nat ==> ()
    op(n) ==
        for s in set {4,5,6} in [7,8,9] do
            for s in seq [10,11,12] in [13,14,15] do skip;

Proof Obligation 1: (Unproved)
f, s: sequence membership obligation in 'DEFAULT' (test.vdm) at line 4:32
(forall n:nat & ((n > 0) and (n < 4)) =>
  n in set elems [1, 2, 3])

Proof Obligation 2: (Unproved)
f, x: set membership obligation in 'DEFAULT' (test.vdm) at line 5:36
(forall n:nat & ((n > 0) and (n < 4)) =>
  (def s in seq [1, 2, 3] = n in
    n in set {4, 5, 6}))

Proof Obligation 3: (Unproved)
op: set membership obligation in 'DEFAULT' (test.vdm) at line 11:13
(forall s in set elems [7, 8, 9] & 
  s in set {4, 5, 6})

Proof Obligation 4: (Unproved)
op: sequence membership obligation in 'DEFAULT' (test.vdm) at line 12:17
(forall s in set elems [13, 14, 15] & 
  s in set elems [10, 11, 12])

nickbattle commented 8 years ago

This has now largely been ported over to Overture 2.3.9-SNAPSHOT. I've been testing some Jenkins builds, with the help of PJ, Paul, Luis and Tomo, but feel free to try yourselves.

See http://overture.au.dk/overture/test/Build-62_2016-07-20_14-45/

nickbattle commented 8 years ago

Message changes for the LRM are now pushed in the documentation 'editing' branch.

peterwvj commented 7 years ago

The feature proposed in this RM is available in Overture version 2.4.0 onwards, and the LRM has been updated accordingly.

overturetool / language