Subtyping between int and integer; mandatory coercions

fpottier commented 2 years ago

Hello! This is an excerpt from my first Gospel specification...

(**An infinite array. *)
type 'a t
(*@ mutable model view: integer -> 'a *)

(**[get a i] reads the value stored at index [i] in the infinite array [a]. **)
val get: 'a t -> int -> 'a
(*@ x = get a i
    requires 0 <= i
    ensures  x = a.view(integer_of_int i) *)

(** [set a i x] stores the value [x] at index [i] in the infinite array [a]. **)
val set: 'a t -> int -> 'a -> unit
(*@ set a i x
    modifies a
    requires 0 <= i
    ensures  a.view = (fun j -> if j = integer_of_int i then x else (old a.view)(j)) *)

Although the Gospel stdlib promises that "the Gospel typechecker will automatically apply integer_of_int whenever necessary", I find that this is not true. If I remove integer_of_int in the postcondition of get, a type error appears. The same is true in the postcondition of set. In the latter case, an alternative is to write fun j : integer -> ... and to omit integer_of_int, but this is not very nice (the documentation does not indicate that type annotations are sometimes mandatory).

So, my questions are:

Is is true that integer_of_int is always inserted where necessary? My guess is that this is not true, and that to make it true, one would have to do type inference with subtyping, which is somewhat complex and costly.
Do we really want to distinguish the types int and integer and to have a subtyping relationship between them? I believe that this could be very confusing; when writing forall i or fun i without a type annotation, it seems difficult to know for sure whether the type-checker infers int or integer. The fact that the functions +, -, etc. seem to be overloaded is also a source of confusion. (I mean, + seems to be both a pure OCaml function of type int -> int -> int and a logical function of type integer -> integer -> integer?) Finally, type inference in the presence of subtyping is tricky, and OCaml programmers are generally not used to subtyping. I believe that this design choice could be a topic of discussion.

An alternative design would be to use just one logical type, integer, together with a predicate isInt(i) which asserts that i inhabits the interval of machine-representable integers. The OCaml type int would then be viewed as an abstract type, equipped with a model field view of type integer, and with the invariant isInt(view i). The functions +, -, etc. would not be overloaded; the mathematical operations would have type integer -> integer -> integer, while the OCaml operations by the same name would have type int -> int -> int. (If it is permitted to refer to an OCaml function inside a Gospel formula, then some mechanism would be needed to disambiguate which function is meant when we write just + in a formula. But, almost always, we want mathematical addition in formulae, not machine addition.) The OCaml functions would either rule out overflows by imposing a precondition, or use modulo arithmetic in the postcondition. Also, it would be desirable to be allowed to write just i instead of view i in logical formulae, but this is not specific of integers: there is a general desire in Gospel to be allowed to write just x instead of view x when x is a persistent abstract data structure.

(Maybe we should invite more people into this discussion...)

fpottier commented 2 years ago

I have noticed that I can simplify the postcondition of set as follows:

(** [set a i x] stores the value [x] at index [i] in the infinite array [a]. **)
val set: 'a t -> int -> 'a -> unit
(*@ set a i x
    modifies a
    requires 0 <= i
    ensures  a.view = (old a.view)[i -> x] *)

This is very nice, but does not really answer my questions.

fpottier commented 2 years ago

In general, in the universe of logical types, are we prepared to have subset types such as { x : t | P x } and to view { x : t | P x } as a subtype of t (à la PVS, I suppose)? At the moment, my answer would be that 1- we can and should probably live without this feature; hence 2- we should remove the logical type int and its subtyping relationship with the logical type integer, because it is just a special case of item 1.

mariojppereira commented 2 years ago

Although the Gospel stdlib promises that "the Gospel typechecker will automatically apply integer_of_int whenever necessary", I find that this is not true. If I remove integer_of_int in the postcondition of get, a type error appears.

This one is a bit odd. If I am not mistaken, Gospel's coercion mechanism is strongly inspiration by that of Why3. I have just written a similar spec to yours in WhyML and everything works fine.

The same is true in the postcondition of set. In the latter case, an alternative is to write fun j : integer -> ... and to omit integer_of_int, but this is not very nice (the documentation does not indicate that type annotations are sometimes mandatory).

This one is a bit trickier. From my understanding, when you try to type-check the term fun j -> j = i ... the use of the (=) operator (which is of type 'a -> 'a -> bool) actually sets j to be of type int. In this case, I would be tempted to say that type annotation is the preferred way to make the specification type-check.

On a side note: we have had our share of difficult times with coercions, machine integers vs mathematical integers, and lambda terms. Take a look, for instance, at the Gospel specification of the Vector module inside VOCaL: https://github.com/ocaml-gospel/gospel/blob/bd54a5199a7f04dab149036eb9073f7b53b979fd/test/vocal/Vector.mli#L59-L68 Here, we had to explicitly put a type annotation in the quantification (postcondition in line 68), otherwise we would not be able to apply function f.

So, my questions are:

Is is true that integer_of_int is always inserted where necessary? My guess is that this is not true, and that to make it true, one would have to do type inference with subtyping, which is somewhat complex and costly.

In function get the coercion should be automatically applied (this is probably a bug); in function set, I am not sure it should be the case.

Do we really want to distinguish the types int and integer and to have a subtyping relationship between them? I believe that this could be very confusing; when writing forall i or fun i without a type annotation, it seems difficult to know for sure whether the type-checker infers int or integer. The fact that the functions +, -, etc. seem to be overloaded is also a source of confusion. (I mean, + seems to be both a pure OCaml function of type int -> int -> int and a logical function of type integer -> integer -> integer?) Finally, type inference in the presence of subtyping is tricky, and OCaml programmers are generally not used to subtyping. I believe that this design choice could be a topic of discussion.

Note that the Gospel standard library purposely introduces arithmetic functions over integer arguments: https://github.com/ocaml-gospel/gospel/blob/bd54a5199a7f04dab149036eb9073f7b53b979fd/src/stdlib/gospelstdlib.mli#L58-L69 OCaml functions +, -, etc., are not used inside Gospel specification. This is actually something that we should make very clear (maybe in the documentation): OCaml functions should not be used for specification purposes. If we really want to have an OCaml operation and a Gospel one sharing the same name, then we shall explicitly introduce both. Example:

  (*@ function succ (x: integer) : integer *)

  val succ (x: int) : int
  (*@ r = succ x
        requires x + 1 <= max_int
        ensures r = succ x *)

An alternative design would be to use just one logical type, integer, together with a predicate isInt(i) which asserts that i inhabits the interval of machine-representable integers. The OCaml type int would then be viewed as an abstract type, equipped with a model field view of type integer, and with the invariant isInt(view i).

This is similar to what we propose in Why3 with range types to model machine arithmetic (cf, https://gitlab.inria.fr/why3/why3/-/blob/master/stdlib/mach/int.mlw).

I would not be against adding this into the Gospel Stdlib. @backtracking @pascutto @n-osborne

The functions +, -, etc. would not be overloaded; the mathematical operations would have type integer -> integer -> integer, while the OCaml operations by the same name would have type int -> int -> int. (If it is permitted to refer to an OCaml function inside a Gospel formula, then some mechanism would be needed to disambiguate which function is meant when we write just + in a formula. But, almost always, we want mathematical addition in formulae, not machine addition.) The OCaml functions would either rule out overflows by imposing a precondition, or use modulo arithmetic in the postcondition.

I have recently started to attach Gospel specification to the OCaml standard library, where I follow exactly this approach (cf, https://github.com/mariojppereira/cameleer/blob/mario_new_stdlib/src/ocamlstdlib.mli). Inside specification, I am only resorting to symbols from the Gospel standard library. This file is to be consumed by the Why3gospel plugin, which I then can use inside Cameleer.

Also, it would be desirable to be allowed to write just i instead of view i in logical formulae, but this is not specific of integers: there is a general desire in Gospel to be allowed to write just x instead of view x when x is a persistent abstract data structure.

In principle, this should be possible since one can declare a field as a coercion. @pascutto and @n-osborne , please correct me if I am wrong (I remember the status of model fields changed a while ago, in particular to distinguish them from logical functions, but I am not sure how this affects coercions).

backtracking commented 2 years ago

This one is a bit odd. If I am not mistaken, Gospel's coercion mechanism is strongly inspiration by that of Why3.

And Why3's coercions are readily taken from those of Coq.

In a few words:

When a type unification fails, we look for a coercion between the head symbols of the two types.
The coercion table is indexed by head type symbols; type parameters are not taken into account.
Coercions are user-declared.
Coercions are automatically composed.
At most one path from a type symbol to another; and thus no cycle in particular.

-- Jean-Christophe

fpottier commented 2 years ago

Should the Gospel coercion mechanism be a subject of debate?

The combination of coercions and OCaml-style type inference does not work, as evidenced by the incorrect claim in the Gospel documentation, which claims that integer_of_int is automatically inserted wherever necessary.
I would suggest removing all coercions in the beginning, and seeing whether life without them is tolerable.

fpottier commented 2 years ago

Regarding Mario's claim, OCaml functions should not be used for specification purposes, I tend to agree with this principle (which should be made more precise and documented).

In fact, I think the difficulty noted by Mario in the specification of Vector.init stems precisely from a failure to follow this principle: the OCaml function f and the mathematical function implemented by f are not properly distinguished, even though they do not have the same type (f has type int -> 'a, but implements a mathematical function of type integer -> 'a).

edwintorok commented 2 years ago

I see that both the https://github.com/mariojppereira/cameleer/blob/mario_new_stdlib/src/ocamlstdlib.mli and the Why3 stdlib treat overflows as errors and as something to avoid. I agree that in general that is what I'd want as well: overflows can be the source of security bugs and in a well-written functional program allowing overflows is more likely to be a bug than desired behaviour.

However https://ocaml.org/api/Int.html says that integers always implement 2's complement arithmetic: "Integers are Sys.int_size bits wide and use two's complement representation. All operations are taken modulo 2Sys.int_size. They do not fail on overflow.". In some scenarios I can see programs relying on this (e.g. implementing low-level crypto or network protocols), although in those cases they might use int32 or int64 for compatibility with whatever protocol they implement, is it very unlikely that someone would want to rely on 2's complement arithmetic on an int31 or int63 type (how would you even know which one you get?), but that is the documented behaviour of int* types in OCaml, and an OCaml program relying on them isn't necessarily buggy.

It might be useful to give the user a choice: do they want the no-overflow semantic for int, int32 and int64, or do they want the 2's complement semantics? I'd argue that by default it'd be better to implement the no-overflow semantic for int (and prove that it doesn't overflow!), and the 2's complement semantic for int32/int64. I'm not sure how to best offer that choice, but perhaps it can be done through (*@ open Int32.No_overflow *) vs (*@ open Int32.Complement *)? (and similarly for Int, etc.)

Coercions might be tricky in that situation: which coercion do you infer by default? The one to no overflow arithmetic, or to 2's complement arithmetic? (although I assume the 2 modules above could add their own constraints/implementations on integer_of_int depending on the desired semantics, and it'll just depend on whichever is in-scope )

fpottier commented 2 years ago

Indeed, this is a case where we want a choice, and this choice should not be global. I agree that offering a single module (such as Int32) under different names (such as Int32.No_overflow and Int32.Complement) seems to be a reasonable way of achieving this.

mariojppereira commented 2 years ago

In fact, I think the difficulty noted by Mario in the specification of Vector.init stems precisely from a failure to follow this principle: the OCaml function f and the mathematical function implemented by f are not properly distinguished, even though they do not have the same type (f has type int -> 'a, but implements a mathematical function of type integer -> 'a).

This is one of those cases where I am not sure what the best approach would be... How would one refer to the mathematical function implemented by f? One possibility would be to give Vector.init an extra ghost argument f_logic, but one would still need to introduce the precondition

  requires forall x: int. f x = f_logic x

hence keeping the universal quantification over machine integers.

fpottier commented 2 years ago

At least in principle I like the idea of distinguishing f and f_logic and relating them via an explicit assumption. (I would state this assumption in terms of a nested Hoare triple, not an equality f x = f_logic x, because I want to insist that OCaml functions are not logical functions.)

On top of this basic approach, maybe we need syntactic sugar and/or special support for pure functions.

ocaml-gospel / gospel-rfc

Subtyping between int and integer; mandatory coercions #1