tweag / nickel

Better configuration for less
https://nickel-lang.org/
MIT License
2.44k stars 93 forks source link

Base types for contracts (or contracts seen as refinement types) #420

Open yannham opened 3 years ago

yannham commented 3 years ago

Is your feature request related to a problem? Please describe. Very often, user-defined contracts start along these lines:

let Port = fun label value =>
  if builtins.isNum value then
    ...
  else
    builtins.blame (builtins.tag "not a number" label) in

let Url = fun label value =>
  if builtins.isStr value then
    ...
  else
    builtins.blame (builtins.tag "not a string" label) in

...

That is, most contracts accept values that are in a subset of an existing builtin type (they are (dynamic) refinement types). This boilerplate feels like it could be added automatically by declaring explicitly a base type for a contract. Moreover, this information can then be leveraged by the type system:

  1. Conservatively in one direction, uniquely allowing safe implicit upcast from, say, a #Port to a Num, but not the other way around (this sets #Port as a subtype of Num).
  2. Conservatively in both directions, allowing implicit casts from #Port to Num as in 1., as well as the reverse Num to #Port but protecting the unsafe conversion by inserting automatically a contract check. This corresponds more to the philosophy of traditional gradual typing which is to statically allow casts and insert checks automatically between "compatible types" (casts that have a chance of success), but would depart from Nickel's current design to require any statically unsafe cast to be written explicitly.
  3. Forgetting the refinement represented by the contract altogether and just considering all #Ports as Nums statically

Describe the solution you'd like

A syntax to declare that a contract is a refinement of an existing builtin static type, as it is the case for example in the system of Well-typed program can't be blamed (although they don't do it exactly for the same reasons). Syntax suggestion (<: is not taken and fit well with the subtyping interpretation):

let Port <: Num = fun label value => ...

Dynamically, this would wrap the implementation of Port with a code that checks if value is a string in the first place. Something like:

let refine = fun typeName hasType contract label value =>
 if hasType value then
    contract label value
  else
    builtins.blame (builtins.tag "not a #{typeName}" label) in

let Port <: Num = ...
// rewrites to
let Port_ = ...
let Port = refine "number" builtins.isNum Port_

Statically, it can be interpreted as any of the alternatives mentioned above, but this can be decided later. We could probably actually support having any contract (including all static types that are convertible to contracts) as a base by defining refine as:

let refine = fun baseName baseContract contract label value =>
  let tagged = builtins.tag "not a #{baseName}" label in
  contract label (baseContract tagged value)

However, this can quickly leads to cyclic contracts, that would both loop dynamically as well as introduce cycles in the subtyping relation which may even make typechecking loop:

{
  One <: #Two = ....,
  Two <: #One = ....,
}

In practice, restricting to 0-ary types (that is Num, Str and Bool) should cover most use cases. Extending to any "static" type (which doesn't contain any user-defined contract as a subtype) is probably fine.

Describe alternatives you've considered

thufschmitt commented 3 years ago

I like this a lot (esp the first proposal for the static typing). It has a smart constructor feeling, except built into the language (and with an explicit upcast), which is pretty awesome. I’d love being able to say for example

let InternalizedStr <: Str = fun label value =>
    if isNormalized value then value else builtins.blame (builtins.tag "invalid input" label)
in …

and be able to treat InternalizedStrs transparently (and safely) as Str internally

However, this can quickly leads to cyclic contracts, that would both loop dynamically as well as introduce cycles in the subtyping relation which may even make typechecking loop:

Maybe we could prevent this by (assuming that Bar is defined as Bar <: TBar = cBar) define Foo <: #Bar = cFoo as Foo <: TBar = refine cBar cFoo − meaning that it’s essentially syntactic sugar over something that could be manually written anyways. It can obviously lead to runtime contract loops, but not that much more than without it, and I don’t think it could lead to any cyclic subtyping relationship since it trivially rewrites to something that only creates relations of the form contract <: staticType. But maybe it’s also confusing to have Foo <: Int transparently upcast a Foo to an Int but not Foo <: #Bar.

In practice, restricting to 0-ary types (that is Num, Str and Bool) should cover most use cases. Extending to any "static" type (which doesn't contain any user-defined contract as a subtype) is probably fine.

That sounds a bit limiting a priori, but maybe it isn’t too much of an issue in practice (in the worst case this can be worked around with a bit of boilerplate when needed). And because it’s conservative it leaves room for extending it later if needs be. So that’s probably OK.

yannham commented 3 years ago

Maybe we could prevent this by (assuming that Bar is defined as Bar <: TBar = cBar) define Foo <: #Bar = cFoo as Foo <: TBar = refine cBar cFoo − meaning that it’s essentially syntactic sugar over something that could be manually written anyways. It can obviously lead to runtime contract loops, but not that much more than without it, and I don’t think it could lead to any cyclic subtyping relationship since it trivially rewrites to something that only creates relations of the form contract <: staticType. But maybe it’s also confusing to have Foo <: Int transparently upcast a Foo to an Int but not Foo <: #Bar.

This makes sense. As you say this means that #Foo is not a subtype of #Bar statically, but this does sound like a viable tradeoff. Another approach way would be to detect cycles in the induced subtyping relation, which shouldn't be very hard. In any case, I think it's safe to start with refining over "static" types, and then to go for general refinement later.

litchipi commented 3 years ago

Maybe we could prevent this by (assuming that Bar is defined as Bar <: TBar = cBar) define Foo <: #Bar = cFoo as Foo <: TBar = refine cBar cFoo − meaning that it’s essentially syntactic sugar over something that could be manually written anyways.

I do not understand really what refine does, but in the essence I kinda see what you mean and how we could avoid loops in typechecking.

That sounds a bit limiting a priori, but maybe it isn’t too much of an issue in practice (in the worst case this can be worked around with a bit of boilerplate when needed). And because it’s conservative it leaves room for extending it later if needs be. So that’s probably OK.

I don't really agree on this, for me it would be very limiting to restrict this feature to builtin types or static types. Not restricting it would be dangerous if someone doesn't observe any good practice with type definition, but would allow to create wrappers and gradually create big abstractions, which why we need a configuration language originally imo. However, to implement this with static and builtin types first and then expand it, that would surely be easier.

I do not really like the <: syntax, but that's personal taste. I'd rather let Port is Num = fun label value => ... or a rusty let Port : Num = fun label value => ...

yannham commented 3 years ago

I don't really agree on this, for me it would be very limiting to restrict this feature to builtin types or static types. Not restricting it would be dangerous if someone doesn't observe any good practice with type definition, but would allow to create wrappers and gradually create big abstractions, which why we need a configuration language originally imo.

I don't know. I just recently started dogfooding, but I currently haven't had the use for a contract based over another custom contract in practice: it's always mostly refined over base types like Str and Num. Additionally, one possible disadvantage I can see is this can lead to a very complex contracts hierarchy, like object inheritance (or Haskell typeclasses). In a fictional example, say you find a contract Foo and want to know what it does, but if it is a chain of 5 refined contracts, Foo <: Bar <: Baz <: Doe <: Num. You gonna have to make a lot of jumps to understand the logic. Of course you can already write it as a composed contract by applying other contracts by hand currently but I would rather not have a language idiom that encourage this.

Bottom line is, refinement over general custom contracts is not out of the question, but we need good evidence that this actually useful in practice, which is not obvious to me at the time being.

I do not really like the <: syntax, but that's personal taste. I'd rather let Port is Num = fun label value => ... or a rusty let Port : Num = fun label value => ...

Note that the last suggestion is not possible, because it would mean that Port is a number (you can already write this in Nickel), but it is not. Port is morally a function Label -> Dyn -> #Port. I personally tend to prefer well established operators over idiosyncratic keywords, because operators like subtyping <: have usually a better defined meaning than a word like is (which could mean a lot of things), and that also lets you use is as e.g. an identifier. But that's also personal taste :slightly_smiling_face:

aspiwack commented 3 years ago

I'm curious (maybe a little confused even) about what is precisely in this issue.

I assume that we can already use types as values (their value is the underlying contract). If this is the case, there is no need for a special syntax. In fact, refine is “just” an intersection of sort. I think #461 wants to have such a conjunction, so would solving #461 also solve this issue?

I've opined similarly in a different PR (I don't remember what the actual subject was there): there is a bunch of PRs which seem to call for being special cases of a more general solution. I'd rather try to keep Nickel fairly minimal and avoid having too many special cases.

yannham commented 3 years ago

I assume that we can already use types as values (their value is the underlying contract)

There isn't, actually. There is currently no way of getting the contract corresponding to an arbitrary type expression.

If this is the case, there is no need for a special syntax. In fact, refine is “just” an intersection of sort. I think #461 wants to have such a conjunction, so would solving #461 also solve this issue?

For #461, imagining we have this operator to derive contracts from an arbitrary type expression §(type) and that merging on contract is the "naive" intersection, we could indeed write something like that:

let Port = §(Num) & (fun label value =>
  ... ) in

Which would have the same dynamic semantics as the proposed

let Port <: Num = ... 

However, it wouldn't give anything on the static type side, as everything happens on the side of values. So the typechecker wouldn't be able to leverage the information that Port is a refinement of Num. To me, the original motivation is that contracts are almost always refined over a base type more precise that Dyn, and that is a bit of a shame not to leverage this information on the statically typed side in one way or another. In Wadler et al's paper, this base type is actually part of the definition of custom contracts. Also, this makes users write some repetitive code, as mentioned above. This proposal aims to solve both at once, by making contracts acting like lightweight smart constructors over a type, as @regnat says, which IMHO is fairly natural (they already are in some sense, with respect to the type they generate themselves, as the only way to get a #Foo is to apply the contract Foo to something).

It's true that for the boilerplate part, a "naive intersection" operator at the value levels plus an "extract contract" operator could be combined to achieve the same. Although, to be honest, the "naive intersection" operator looks even more like a convenience for a specific case, as we can very well already write as a function allOf : List Contract -> Contract and/or of compose : Contract -> Contract -> Contract (where let type Contract = Lbl -> Dyn -> Dyn). Although it would fit quite well with the overall semantics of merging. On the other hand, we can't have anything like <: or § currently.

aspiwack commented 3 years ago

There isn't, actually. There is currently no way of getting the contract corresponding to an arbitrary type expression.

Maybe this is something that should be fixed, then. Since it seems to be a core component of a number of issues.

So the typechecker wouldn't be able to leverage the information that Port is a refinement of Num. To me, the original motivation is that contracts are almost always refined over a base type more precise that Dyn, and that is a bit of a shame not to leverage this information on the statically typed side in one way or another.

Fair enough, but I don't think that your proposal gives a straightforward way to a solution for static typing. Would it be worth introducing a syntax for subtypes such as { x : Num | Port x } (or such a thing) and use type aliases in the style of #464 ? Maybe you could use static typing in the contract definition even.

I don't imagine that we could use arbitrary types as the base type. Subtypes of Num -> Num make sense, but the paper indicates that the corresponding contract would not work so well. Maybe the static-typing-the-contract idea would help there, but I'm not super hopeful (well, to be precise: I'm sure it would work; I'm not sure that the semantics would be one that we desire).