Equality and Order Clauses for Type Definitions

paulch42 commented 8 years ago

Identification of the Originator

Paul Chisholm

Target of the request, defining the affected components of the language definition

VDM-SL order relation

Motivation for the request

All types in VDM-SL admit equality via a primitive (i.e. built-in) equality relation. Numeric types in VDM-SL are the only types that admit an order relation. A general purpose order relation can be defined for most types in VDM-SL, in a similar manner to equality. We need only define the < relation; equality plus < allows the derivation of <=, > and >=.

Order is ubiquitous. After equality, order is the most commonly used relation in computing. When specifying a data type, we very often require an order relation on the type. In VDM-SL we must define a truth valued function that implements the order relation. If we want the familiar complement of operators (<, <=, >, >=) then we must define four functions. Comparisons in a specification then employ the functions, but using function application notation rather than the familiar symbolic infix notation.

The Overture VDM-SL example ISO8601 specifies types for date, time, date/time and duration. The effective use of these types in specifications requires an order relation on each of the four types. Sixteen functions are specified: <, <=, >, >= on each of the four types. In every case the order relation is the ‘obvious’ or ‘natural’ ordering one would expect.

Description of the request

The attached VDM-SL module ‘Order.vdmsl’ contains a model of VDM-SL values and defines an order relation over those values. A notion of ‘comparable’ values is defined, with the order relation being total with respect to comparability. The primitive VDM-SL order relation on numeric values could be extended to the order relation defined in ‘Order.vdmsl’.

Notes:

As for equality, the ‘:-’ separator in a composite type indicates this field plays no role in the order relation. Since <= is defined in terms of < and = (and >= in terms of > and =), allowing a different set of fields than for equality would lead to inconsistency. That is, it would be possible for both a=b and a<b to be true.
Order on functions is excluded.
Order on tokens is excluded. The VDM-10 Language Manual states “...they cannot be used for anything other than equality and inequality.”

description of the modification

Syntax unchanged.
Type checker generalised to allow the comparison operators to be applied to values of comparable types.
Interpreter extended to implement order relation.

benefits from the modification

There is no need to define (up to four) functions associated with a data type.
Specifications that use the order relation are clearer and easier to interpret since we immediately recognise the familiar symbols and infix notation.
Proofs can employ the common properties of order relations, rather than having to re-establish them for individual types.
Sequence comprehension with a set binding requires the set to contain numeric values. With a generic order relation, the sets could contain values of any type that admits order, increasing the utility of sequence comprehension.
Opens up the possibility of specifying general purpose functions without having to define and pass around explicit order relations.

possible side effects

None - existing specifications are semantically unchanged

A test suite for validation of the modification in the executable models (if appropriate).

None

The following is the specification of the order relation as a VDM-SL model.

Order.vdmsl.txt

ldcouto commented 8 years ago

Hi Paul,

Apologies for not engaging with this sooner. I will be honest and say that the compose keyword always frightens me off at first.

Would it be fair to summarise your proposal as follows?

introduce ordering functions for the VDM primitive and structured types
provide infix notation for those functions via >=, < etc.

This seems worthwhile, although we would need to examine the proposed ordering more carefully. It looks good on first pass, though.

I quite like that these functions are implemented in VDM-SL. It seems very much in the bootstrapping spirit of VDM. I would like for it to stay like this in any final implementation. But, to the best of my knowledge, none of the VDM tools have features for defining infix notations for functions. So I am not sure if this would be possible.

tomooda commented 8 years ago

Paul,

I agree that universal (with a few exceptions) ordering is beneficial, but I need to make one confirmation. In the attached spec, CHARACTERS are latin, but we will apparently need to extend it to cover all unicode characters. Will the comparison between two characters be made simply by their code points?

paulch42 commented 8 years ago

@tomooda I wasn’t quite sure about the character set supported by VDM and if it differs across implementations, but the intent is that ordering would be as per the numeric ordering of the underlying character encoding.

paulch42 commented 8 years ago

@ldcouto Yes, your summary captures the intent of the RM.

Aside from the infix operation notation, I'm not sure how this could be implemented in VDM-SL while ensuring that it remains usable in practice, at least with the approach I have taken. My goal is that the feature should assist in creating and understanding specifications, so it is important that the solution leads to concise and immediate specifications.

While I don't fully understand it, the 'Meta-VDM' approach proposed in RM #37 might help.

paulch42 commented 8 years ago

Prompted by Luis comment, I realised there was no good reason to use the compose construct in the specification. I have simplified it to use :: instead. See attached.

Order.vdmsl.txt

Note the original RM has been updated to refer to this revised specification.

peterwvj commented 8 years ago

Good. To avoid any confusion, could you (Paul) edit the original post to refer to the revised spec (assuming that is what you want) ? Github should allow you to do that.

annehax commented 8 years ago

Today, the proposal was discussed by some members (Nick, Peter, Tomo and Anne) of the language board.

Here is my personal conclusion from that discussion: While the idea of having a default ordering is nice in principle, it may be questionable to which extent the suggested orderings are standard.

Since users may like to use other orderings than the suggested ones, an alternative (or added feature) could be to allow the user to overload the < operator with the desired meaning. Below two way of doing that will be sketched, one with some inspiration from RSL.

For inspiration, here is how we treated the issue in RSL: In RSL we only have a pre-defined meaning of <, <=, > and >= for Nat, Int and Real. The user is allowed to define the meaning on the symbol for any type. Example:

< : Bool >< Bool -> Bool b1 < b2 is (not b1 and b2)

RSL does not pose any restrictions on the definition, e.g. whether it is a partial or total order. Actually it does not at all need to be an order.

One possibility is that you allow the user to define symbols like < in such a way. However, I would suggest that you impose POs expressing the following requirements: transitivity, anti-symmetry, and anti-reflexivity.

As an alternative to define the relational operators in RSL way, a possibility, suggested by Nick, could be: "When you define a type, you could (in addition to an "inv") have an "order" clause." Again POs expressing the following requirements: transitivity, anti-symmetry, anti-reflexivity should be generated.

nickbattle commented 8 years ago

Something like:

T = nat * nat
inv t == t.#1 <> t.#2
order a, b == (a.#1 + a.#2) < (b.#2 + b.#2);

The order clause would produce a callable function, like T_order: T * T +> bool.

paulch42 commented 8 years ago

@nickbattle A couple of questions.

Does this mean I couldn't compare an arbitrary product for order. I would have to define a type capturing the product and the order relation for that type. If for example I wanted nat*bool, nat*nat, bool*nat and bool*bool, do I need four types and do I need to define an order relation on each of those types, all of which repeat the standard lexicographic ordering on pairs?
Would I have to use the callable function T_order in my specifications or does the order annotation implicitly allow me to use the infix operator < (and <=, >, >=)?

paulch42 commented 8 years ago

@annehax I would be interested to know which types the LB consider to have a generally agreed order and which do not. Here is the reasoning behind my choices.

The primitive types all seem to be clear (with the exception of token which is excluded explicitly).
Sequence: lexicographic order on sequence elements is almost universal.
Product: lexicographic order on tuple elements is almost universal.
Set: debatable to use sequence order on the ascending sequences derived from the sets (though since by inductive argument the set elements are comparable, this is feasible.) It is interesting that Overture employs this technique for sequence comprehension with a set binding by creating an ordered set (hence VDM-SL presently only supports sequence comprehension with set binding when the set elements are numeric).
Map: this was an arbitrary choice, there doesn't seem any obvious relation on maps. There is a strong argument to exclude maps from the order relation. (Perhaps a less arbitrary choice would have been order over the sets of key/value pairs.)
Composite: requires that the labels of the fields of the two composite values be the same to admit order. While it is not made clear in the Language Manual, Overture raises a type error if two composite values compared for equality do not have the same labels so any order relation should satisfy the same constraint. The constructor of a composite value requires the arguments appear in the same order as the fields in the type definition, so the order of fields is explicit. Consequently, lexicographic ordering on the sequence of field values seems a reasonable default order.
Option: order is based on two principles a) undefined always precedes defined, and b) [A] and B are comparable if A and B are comparable.
Union: a value of a union type is a value of one of the type components in the definition of the union. Consequently, provided the values satisfy the comparable constraint, then compare the wrapped values for order.
Function: excluded.

Overloading of < (and the other three order relations) would be nice. What would be really nice is something like the class concept in Haskell, but I think that is a very big language change.

nickbattle commented 8 years ago

In response to your first questions Paul, I was thinking of making the ordering clause part of the definition of a type, so yes you would need to define four types and specify their orderings (as lexicographical, if you wish). Of course these could use a common (polymorphic) function.

And yes, I imagine the "<" and ">" operators as being overloaded in that case implicitly for types that have an order clause. It would be a typecheck error to try to compare non-order-specified types using "<" and ">". The order would be applied implicitly in some cases, which we would have to carefully define, such as sequence comprehensions: the rule would be that you must bind a single "ordered type" rather than binding a numeric type, as it is currently.

The attraction of this for me is that we are explicit about ordering. I can see the argument that default orderings reduce the burden on the specifier, but they also introduce more implicit semantics, which (as a rule) I think is a bad thing in a specification.

nickbattle commented 8 years ago

One point that came up regards the use of character encodings for char orderings. If we do accept default orderings, I think we have to be careful with this one. In different locales, scripts that include modified letters (a-umlaut etc) have a different view on their collation sequence.

We cannot have a specification that accidentally means something different to the Spanish and French etc! We could order by a universally agreed "encoding" - for example the letters' Unicode codepoint - but although that would solve the problem, it still seems arbitrary and may make life difficult in locales who specifically want a particular collation sequence.

If we allow a specification to define its own String type with an order clause, the chosen collation solution is made explicit (albeit with some extra effort from the specifier).

nlmave commented 8 years ago

I'm not at all convinced by this RM to be honest; unlike RSL, operators cannot be overloaded in VDM-SL. As with any language change proposed, we need to be careful not to deviate too much from the ISO standard definitions - this one is borderline I think. From my (user) point of view, implicit lexicographical ordering is not very intuitive for anything else other than seq of char and numericals. The issue of unicode is obviously simple to solve, but it would be laborous task to implement, as there are many many Unicode tables: see http://unicode.org/charts/. And can we compare strings from different Unicode tables? The lexicographical ordering of set of nat currently is IMHO not a semantic feature of the language at all, but an implementation artifact to ensure that specifications always yield a deterministic result when set bindings are evaluated. Looking at the size of the specification written by Paul, is it really so cumbersome to specify the ordering by hand which would require a change to the language semantics? Of course the short-hand infix notation is appealing, but does it really improve our expressive power? Personally, I rather have an explicit ordering specification of my data type ordering than relying on some implicit notion that is built in the language / tool. Anyway, my two cents worth!

peterwvj commented 8 years ago

I'm also not convinced that defining a default order relation for all types is that helpful. It's easy to imagine type definitions where lexicographical ordering (or some other default order relation) does not suffice. The languages I know of require order to be explicitly defined - for example through operator overloading or by implementing special interfaces. I always found that to be sufficient for my needs. Although the RM (as formulated originally) is asking for a default order relation, I think what we should really provide is a way to define order.

tomooda commented 8 years ago

One complication of order clause is that it inherently brings polymorphism. Because one value can be of multiple types, we need to specify which type's ordering should be applied with the current VDM's type system. For example,

types
    T1 = nat * nat order (mk_(a1, a2), mk_(b1, b2)) == a1 < b1;
    T2 = nat * nat order (mk_(a1, a2), mk_(b1, b2)) == a2 < b2;
    T = T1 | T2;
functions
    positive : T -> bool
    positive(x) == mk_(0, 0) < x;

With the above spec, the interpreter/compiler can't decide which ordering should be applied. We'd need something like

    positive(x) == mk_(0, 0) <[T1] x;

Paul's original RM does not have this issue because the language would have the only one ordering function.

nickbattle commented 8 years ago

Interesting point Tomo. But perhaps the problem is that T itself has no ordering defined, and so the "<" expression in the positive function would be a TC error. If you define an order clause for T, it could use is_T1 and is_T2 combined with order_T1 and order_T2 to define it, and then the positive function will know how to compare elements of T.

paulch42 commented 8 years ago

The consensus seems to be that a default order relation is not the right direction, and that an order clause is a more fruitful avenue for exploration. It is interesting to observe the order clause effectively incorporates the Haskell 'Ord' class in VDM-SL (though not the ability to create user-defined classes - for those not familiar with Haskell classes, it is different from an OO class).

I thought I would do a little investigation of this approach. The ISO8601 example on the VDM-SL examples web page defines dates, times and durations. In it four types are defined, each with its complement of order functions (a total of 16 functions). I took this module and modified it to use the order clause as described by Nick (removing much of the extraneous material in the source module). The result can be found in the attached.

ISO8601x.vdmsl.txt

I think the outcome looks good. It seems quite natural to associate the order relation with the type directly. Note that since the type DTG is defined in terms of Date and Time, it can use the infix < already defined for Date and Time in its order definition.

The example does raise another interesting point. The primitive equality relation on values is insufficient for type Time. Time includes a timezone offset, so two values of type Time may differ structurally, but be equal semantically. The order relation for time values has to normalise before testing for structural equality. This suggests the possibility of having a way to override the default equality relation on a type (in this case structural equality over the normalised time values). The runtime would use the overridden function when testing for equality. This is effectively the Haskell 'Eq' class in VDM-SL. Although this may be a separate issue, it would allow one to always use == for equality (and <> for inequality); at present when structural equality over a type is inadequate, an explicitly named function must be defined and used.

paulch42 commented 7 years ago

The more I look at it the better the order clause seems as compared to the original proposal. The inclusion of an order clause strongly argues for an equality clause as well. A major benefit of the order clause is that it allows us to define an order relation on a types, and from that definition we are able to use the operators <, <=, > and >= (employing the order relation in combination with the primitive equality). However, it all comes undone if the primitive equality relation is not that which is desired. This is demonstrated by the VDM-SL example ISO8601 and the type 'Time'. The type 'Time' allows a timezone offset so the primitive structural equality is not what we desire: the time 18:00+2:00 is the same as time 14:00-2:00 (i.e. 16:00 UTC). Consequently, an order relation on this type that takes account of the timezone offset is not consistent with structural equality (order considers timezone, equality does not). We would end up in the situation we have today: define a collection of functions for equality and ordering. Therefore, it seems to me that for an order clause to be effective generally requires an equality clause as well. We can see this in the Haskell type class hierarchy: type class Ord depends on type class Eq.

The discussion is now somewhat different to the original proposal. Does this RM need to be closed and a new RM raised describing the revised proposal?

peterwvj commented 7 years ago

This obviously changes the RM quite significantly. Based on the outcome of today's meeting we recommend that you write a revised version of the proposal and post it as a new comment containing a link to the original proposal, https://github.com/overturetool/language/issues/39#issue-186064776 (rather than submitting a new issue). We prefer to use a single Github issue to track all information related to one RM to avoid scattering information across several Github issues.

paulch42 commented 7 years ago

This is a revision to the initial proposal for RM #39. The title has been modified to better reflect the new content.

Identification of the Originator

Paul Chisholm

Target of the request, defining the affected components of the language definition

Enhancement to type definitions that allow explicit equality and order relations to be defined.

Motivation for the request

All types in VDM-SL admit equality via a primitive (i.e. built-in) equality relation. While very convenient, the primitive equality mechanism does not always provide the fine grained control for the required equality. In such situations separate truth valued functions must be defined and employed rather than the infix '=' operator.

Numeric types in VDM-SL are the only types that admit an order relation. Order is ubiquitous in mathematics and computing. After equality, order is the most commonly used relation. When specifying a data type, we very often require an order relation on the type. In VDM-SL we must define a truth valued function that implements the order relation. If we want the familiar complement of operators (<, <=, >, >=) then we must define four functions. Comparisons in a specification then employ the functions, but using function application notation rather than the familiar symbolic infix notation.

The Overture VDM-SL example ISO8601 specifies types for date, time, date/time and duration. The effective use of these types in specifications requires an order relation on each of the four types. Sixteen functions are specified: <, <=, >, >= on each of the four types. Furthermore, since the specification of times allows a timezone offset, the equality relation is not structural. It is necessary to define equality functions on two of the types (times and date/times).

Description of the request

Allow an equality clause to be attached to a type definition in which the equality relation for a type is defined.

Allow and order clause to be attached to a type in which the strict order (less than) relation for a type is defined.

When such an order relation is defined, the environment generates automatically the less than or equal, greater than and greater than or equal relations. If an equality relation is defined for the type, then it is used, otherwise the primitive (structural) equality for the type is employed.

Whenever an order relation for a type is defined, the infix operators <, <=, > and >= can be used in specifications.

description of the modification

The equality relation would appear before the invariant and be of the form

eq x,y == ....

The order relation would appear after the equality clause and before the invariant, and be of the form

ord x,y == ...

In both cases the 'x' and 'y' are patterns so we could have

type natpair = nat*nat ord x,y == x.#1 < y.#1 or x.#1 = y.#1 and x.#2 < y.#2

or

type natpair = nat*nat ord mk(x1,x2),mk(y1,y2) == x1 < y1 or x1 = y1 and x2 < y2

If syntactically viable it would be nice to use the infix operators, as in

x = y == ....

x < y == ...

Not sure if this would be difficult to implement from a syntactic point of view, but using the infix operators would benefit clarity and avoid the need to introduce new keywords.

benefits from the modification

A specification has more fine grained control over the primitive equality relation for a type, as opposed to structural equality over the type (possibly excluding some fields).
A specification can attach an order relation to the type directly (rather than implement via a set of functions at a separate point in the specification).
These relations are associated with the type directly, not defined in functions elsewhere in the specification.
Specifications that use the order relation are clearer and easier to interpret since we immediately recognise the familiar symbols and infix notation.
Proof obligations for order relations can be generated automatically.
Sequence comprehension with a set binding requires the set to contain numeric values. With an order clause, the sets could contain values of any type for which an order relation is defined, increasing the utility of sequence comprehension.
Opens up the possibility of specifying general purpose functions without having to define and pass around explicit order relations.

The attached provides a modified version of the types section of the VDM-SL example ISO8601.vdmsl in a manner similar to the proposal in this RM. It avoids the need for 19 of the function definitions in ISO8601.vdmsl. Additionally, the infix symbolic equality/order relations would be used in numerous pre- and post-conditions in the other functions defined in the module.

ISO8601x.vdmsl.txt

possible side effects

Syntactically some specifications may be rendered incorrect if new keywords were introduced (such as 'eq' and 'ord').

Semantically there should be no change to existing specifications since this is purely an enhancement. Where a type does not have an 'eq' clause the primitive structural equality is adopted. Ordering is only available for numerics. No change to existing semantics.

A test suite for validation of the modification in the executable models (if appropriate).

Will be provided if the RM is approved for implementation.

paulch42 commented 7 years ago

A couple of interesting facts about the Haskell Ord class that are relevant to the discussion.

A type can be classified as ordered by defining a function compare: @a -> @a -> Ordering that returns one of LT, EQ or GT depending on the supplied values, and from which the <, <=, > and >= relations are derived automatically. An interesting alternative approach to that in the RM which proposes the < relation be defined explicitly.
In addition to the order relations noted in the RM, Haskell also generates automatically a function min: @a -> @a -> @a and a function max: @a -> @a -> @a for the type (with the obvious semantics). If it is decided to implement the RM it is advantageous to make min and max available for any type that has an order relation without requiring explicit definition.

paulch42 commented 7 years ago

The RM states arbitrarily the equality and order clauses would appear before the invariant but does not provide any justification. On further consideration it is preferable to present the invariant, then the equality clause, then the order clause. A type definition describes objects structurally. The invariant effectively acts as a filter on the set of structural objects identifying those that belong to the type and those that do not. The equality and order relations need only be defined for those objects that satisfy the invariant; the invariant logically precedes the equality and order clauses.

peterwvj commented 7 years ago

At the LB NM on April 23, all LB members present at the meeting agreed to move this RM to 'Execution'. The execution phase usually involves the development of a draft implementation of the tool feature for testing.

nickbattle commented 7 years ago

Now that the implementation is about to define this for real, we have a couple of choices:

Do we want:
eq x, y == ...
ord x, y == ...
or
eq x = y == ...
ord x < y == ...

On the one hand, the comma version makes the patterns look more like function parameters (which they are, if we provide eq_T: T * T +> bool). On the other hand, using "=" and "<" makes it clear that the clauses are defining the cases for equality and less-than.

And what implicit functions are we going to produce?

eq_T: T * T +> bool
ord_T: T * T +> bool
min_T: T * T +> T
max_T: T * T +> T
?

tomooda commented 7 years ago

I vote for eq x = y == ... and ord x < y == ... because ord x, y == does not display the direction of the order.

For implicit functions, I agree with eq_T and ord_T, but I suppose we can easily define a generic min and max along with argmax, percentile, sort and so on.

It's also possible to define a safe set2seq function which used to suffer undetermism.

paulch42 commented 7 years ago

I prefer the second approach, using = and < explicitly, as it is a more direct statement.

A point for clarification. You mention the implicit functions eq_T and ord_T. Confirm it will also be possible to use the operators <, <=, >, >= with any type that has an ord clause. That is, if I have

Pair = nat * nat
ord p1 < p2 == p1.#1 < p2.#1 or p1.#1 = p2.#1 and p1.#2 < p2.#2;

then later on in a context where I have p:Pair and q:Pair the following expressions are valid

p < q
p <= q
p > q
p >= q

When there is no eq clause the equality relation defaults to the current structural equality. In that context will there still be an eq_T function? To answer that we must answer the question, is eq_T the equality function for the type or is it the function that corresponds to the eq clause?

tomooda commented 7 years ago

I think eq_T function being defined iff an eq clause appears in the type definition would be consistent with inv_T function which is automatically defined iff an inv clause is given.

peterwvj commented 7 years ago

I prefer the second approach as well. I think Tomo makes a good point.

I agree it makes sense to make the equality relation default to structural equality for value types and reference equality for objects. What value would it bring to define eq_T if the corresponding clause does not exist (I'm just trying to understand).

nickbattle commented 7 years ago

OK, so we seem to be settling on the "=" and "<" grammar rather than commas. I prefer this too.

A type with an "ord" clause would permit <, <=, >, and >= comparisons, all of which would use ord_T.

I can see little value in defining eq_T when there is no corresponding "eq" clause. It would just represent the default equality test, for which "=" is available. And ord_T is not defined if there is no "ord" clause.

paulch42 commented 7 years ago

Based on the discussion I have drafted an update to the VDM-10 Language Manual, both for review and as a way to ensure a common understanding.

I don't have TeX expertise so it has been produced as follows:

converted the PDF manual to Word
deleted sections from the Word document that are not affected by the update
edited document with track changes on
PDF'd the edited Word document

Formatting wise it is rather a mess; PDF to Word conversion is poor. However, at this stage we are just interested in content. The changes are highlighted in red.

VDM10_lang_man.pdf

tomooda commented 7 years ago

Thank you, Paul. It's a good start point. I think it would be informative if we explicitly state that (1) the actual body of a comparison operator appeared in an expression is selected based on the static types of the two arguments, and thus (2) given T = nat * nat ord mk_(x1, y1) = mk_(x2, y2) == x1 = x2, a boolean expression mk_(1, 2) = mk_(1, 3) evaluates false because the two values are NOT statically typed as T and the default equality is applied, and let e1:T = mk_(1, 2), e2:T = mk_(1, 3) in e1 = e2 evaluates true because the both e1 and e2 are statically typed as T and eq_T is applied.

nickbattle commented 7 years ago

One subtlety came up regarding the signature of the eq_T and ord_T functions. If you declare something like this:

    T = nat
    inv t == t < 10
    eq t1 = t2 == t1 = t2
    ord t1 < t2 == t1 < t2;

Then you can see that if the parameters to eq_T and ord_T are of type T, then the body will simply recurse into the same function. There is a similar issue with inv_T functions, where their parameter cannot be of type T since to pass one, you would have to have successfully created a value of type T, which could fail the invariant check.

The solution with inv_T is to make the parameter the RHS of the type being defined - ie. "nat" here. Similarly, the signaturs for eq_T and ord_T have to involve the RHS type rather than the real type. This doesn't mean you can't call eq_T and ord_T with T arguments, but they will be "converted down" to nats before the body of the function is executed. That will then behave as expected.

The min_T and max_T signatures are fine with T parameters, because the body of these just use ord_T and eq_T.

So this RM is almost implemented in VDMJ. I have a draft jar out for testing with Paul. If anyone else would like to try it, let me know. When it looks OK, I'll port it over to Overture.

nickbattle commented 7 years ago

Here's the latest test jar. I've given it some basic testing with SL and PP using the tests attached.

vdmj-4.0.0-170426.zip test.vdm test.vpp

peterwvj commented 7 years ago

Sounds great, Nick. I agree with your solution. Naturally, the function definitions in the LRM changes must be updated accordingly.

peterwvj commented 7 years ago

When the draft has been reviewed I'll port the changes over to the LRM.

tomooda commented 7 years ago

Nick, so, assuming we have

T :: nat nat
eq mk_T(x1, y1) = mk_T(x2, y2) == x1 = y1
ord mk_T(x1, y1) < mk_T(x2, y2) == x1 < x2

we will have two type Ts, one is the real T and another is a T without eq and ord? Maybe we can simply say the bodies of eq and ord clauses are out of scope of the newly defined < and = ?

nickbattle commented 7 years ago

The problem is only really apparent with simple named types, like T = nat, rather than record types. With a record type, you are unlikely to say "ord r1 < r1 == r1 < r2", because that looks like a clear tautology. But with simple named types, you can reasonably say that (as my example), and intend that the RHS types should be compared (because the other interpretation is also a tautology).

I see your point about scoping. But that would mean that "a < b" in the body was reasoning about the same types as "a < b" outside that scope, but with different semantics. That feels awkward to me. We would also need to make that effect propagate into any further functions called from that scope (else we would recurse again). That means that functions would give different results depending on where they were called from, and we can't do that.

We (the LB) do still have some work to do to sort out the signature of inv_R functions for records - they are currently defined as "inv_R: R +> bool", which isn't useful - but we have changed the invariant for simple types to be "inv_T: nat +> bool" (for T=nat), and this is for similar reasons to the problem we're discussing. Without this, you cannot call the invariant function without implicitly having to call it to create its argument. In the current problem, you can't (in effect) call ord_T without recursing into ord_T. So I've initially solved the problem in the same way.

What it's saying, in effect, is that inside the ord clause you are reasoning about the raw value that the type is made from, rather than the type itself. The achieves the "limited scope" I suppose, but it's more explicit. That feels right to me.

peterwvj commented 7 years ago

Thanks for doing all of this. I've been testing the draft implementation and so far it's looking good. I have one concern about the error message that is produced if you omit the eq clause. For example,

types

  T = nat
  -- eq t1 = t2 == t1 = t2   -- Note, the 'eq' clause is commented out
  ord t1 < t2 == t1 > t2;    -- Note descending order!

λ java -jar vdmj-4.0.0-170426.jar -vdmsl -i test.vdmsl 
Parsed 1 module in 0.121 secs. No syntax errors
Error 3182: Name 'eq_T' is not in scope in 'DEFAULT' (console) at line 1:19
Type checked 1 module in 0.252 secs. Found 1 type error
Bye

Can this be improved somehow? I mean, it might not be obvious to the user what eq_T is, and I'm not sure why it is reporting 1:19 as the location.

peterwvj commented 7 years ago

Could omission of the eq clause be reported as a parse error instead?

nickbattle commented 7 years ago

Ah, good spot PJ :) What's happening here is that the min/max functions are being defined, and their bodies are expressions like "if ord_T(a, b) or eq_T(a, b) then a else b". That is being type-checked as a sort of "sub-spec", so it is not located in the real source. Hence the 1:19 location, since it's all on line 1.

So two problems: how to report type checking errors in these implicit bodies (here, eq_T is used but not defined); and secondly, what to do if eq is not defined?

We could implicitly define eq_T, with default semantics, if it's not declared? Or we could insist that if you define ord, then you define eq (though eq by itself would be OK)?

If we solve that, then the min/max bodies will always be type-clean, so the reporting issue goes away :)

nickbattle commented 7 years ago

Or I suppose we could redefine the min/max expressions to use "<" and "=" instead. That way, they would use the equality definition if provided, else use default equality semantics. That sounds better to me - I'll see whether it works...

peterwvj commented 7 years ago

I would be OK with implicitly defining eq_T if it does not exists since default behavior is likely to be what you want. It seems rather silly to have to define what is already the default behavior. I agree that eq_T by itself should be OK.

nickbattle commented 7 years ago

Yes, that seems to work. I tried to attach an updated jar ZIP, but GitHub won't let me. I'll email you one PJ, and try to attach it later...

vdmj-4.0.0-170427.zip

peterwvj commented 7 years ago

I missed your second message. Yeah, that sounds reasonable, Nick.

peterwvj commented 7 years ago

Don't worry about emailing me one, the jar ZIP is correctly attached.

peterwvj commented 7 years ago

Food for thought :)

What happens if you define a type based on a type that already defines equality and order? To demonstrate, consider the following spec:

types

  T1 = T2;

  T2 = nat
  eq t1 = t2 == t1 = t2
  ord t1 < t2 == t1 > t2;

functions

  useT1: () -> bool
  useT1() ==
  let x : T1 = 1, -- Note use of T1
      y : T1 = 2  -- Note use of T1
  in
    x > y;

  useT2: () -> bool
  useT2() ==
  let x : T2 = 1, -- Note use of T2
      y : T2 = 2  -- Note use of T2
  in
    x > y

By intuition, I would expect the two functions to return the same value, however, currently they don't. What do you think?

λ java -jar vdmj-4.0.0-170427.jar -vdmsl -i test.vdmsl 
Parsed 1 module in 0.11 secs. No syntax errors
Type checked 1 module in 0.245 secs. No type errors
Initialized 1 module in 0.142 secs. 
Interpreter started
> p useT1()
= false
Executed in 0.017 secs. 
> p useT2()
= true
Executed in 0.004 secs.

nickbattle commented 7 years ago

Hmmm.... good question. And what about a union of types, some of which include ordering clauses? Do we have some sort of (multiple) inheritance hierarchy here?

ldcouto commented 7 years ago

If T1 = T2, then every property of T2 should hold in T1 (unless it's overridden).

I am a little worried about mixing explicit ordering and implicit equality. This could easily lead to things like x<y and x=y. Which seems really broken?

For union types, treat it as you would any other union. Example Int | Char.

If the two values are of the same sub-type we should allow the comparison. Otherwise, it's a runtime error. However, this means we need some subtype Proof Obligations to be generated for these expressions.

peterwvj commented 7 years ago

If T1 = T2, then every property of T2 should hold in T1 (unless it's overridden).

Yes, I think you are right. That's also consistent with the way invariants work. In the following spec both functions fail due to an invariant violation.

types

  T1 = T2;

  T2 = nat
  inv x == x > 0;

functions

  useT1: () -> nat
  useT1() ==
  let x : T1 = 0
  in
    x;

  useT2: () -> nat
  useT2() ==
  let x : T2 = 0
  in
    x

nickbattle commented 7 years ago

OK, I'll look into how that magic occurs for invariants, and see whether we can do the same for equality and ordering - and we can worry about union types afterwards.

Meanwhile, I've just tweaked it to use ordering properly with sequence comprehensions - ie. the binding no longer has to be "numeric", but just "ordered". Jar attached.

vdmj-4.0.0-170427.zip

overturetool / language