tweag / nickel

Better configuration for less
https://nickel-lang.org/
MIT License
2.24k stars 84 forks source link

User-defined types #422

Open thufschmitt opened 2 years ago

thufschmitt commented 2 years ago

Is your feature request related to a problem? Please describe.

Maybe I just missed it, but it seems that there’s no way to define (name) custom static types.

Describe the solution you'd like

A syntax allowing me to (for example):

let type t = { foo: Num } in
[ { foo = 1 }, { foo = 2 }] : t

Describe alternatives you've considered

Use contracts everywhere instead, something like:

let t = { foo | Num} in
[ { foo: 1 }, { foo: 2 }] | List #t

But that’s a bit of a shame

thufschmitt commented 2 years ago

I’d really like to see this solved, as it’s making it utterly hard to use typed code in real-life.

I suppose #420 could get us a long way if we allow e : contract as a syntax for e : {The static type that contracts refines}, as we could have something like:

let t <: { foo : Num } = #True in // Yes, the `True` contract doesn’t exist, but you see what I mean
[ { foo = 1 }, { foo = 2 }] : List #t
yannham commented 2 years ago

Sorry, I missed this. But I think @litchipi is now on it, right?

thufschmitt commented 2 years ago

Sorry, I missed this. But I think @litchipi is now on it, right?

Yes, he is :) We’re supposed to discuss this offline next week

thufschmitt commented 2 years ago

We’ve had a bit of chat with @litchipi about that.

This is actually a bit more involved than what I initially though, in particular because the built-in types are currently hardcoded whithin the interpreter − the definition of types explicitly lists the existing builtin types − meaning that the issue is not just a syntactical one (define new type aliases), but also a typing one (extend the typechecking to also handle type aliases).

As a rough implementation plan, this would mean something like:

  1. Extend the definition of types to also include a reference to an identifier (similar to the Var(Indent) case, but for concrete types rather than type variables).
  2. Extend the typechecking algorithm to handle these types properly (including being careful with recursive types). Probably means passing a new “type aliases” environment to be able to resolve type aliases.
  3. Add a syntax to define these aliases. I suspect we might also want this syntax to define a corresponding contact at the same time, although the exact semantics are a bit fuzzy.

Another option would be to inline all the type aliases before starting the actual type-checking, but that might not make things simpler, and that means forbidding recursive types altogether (because they would expand to an infinite type), which is probably too restrictive in practice (well, I guess it is, maybe I’m wrong here).

(@litchipi don’t hesitate to correct me if there’s anything wrong or missing)

litchipi commented 2 years ago

One idea would be to have a unified type definition, and create pre-defined types type Num = %num%, type Str = %str%, etc ... to handle builtins types, while having a unified type definition system when handling variables or fields. I'll be able to provide more details, plans and ideas after understanding everything related to the grammar / lexer / lalrpop / typechecking

litchipi commented 2 years ago

After some experimentations, I arrive to a form like this which I find satisfactory:

let SharedObjectFile = fun label value =>
  if builtins.isStr value then
    if strings.isMatch value m#"\.so$"#m then
      value
    else
      contracts.blame (contracts.tag "not an .so file" label)
  else
    contracts.blame (contracts.tag "not a string" label) in

let PathContract =
  let pattern = m#"^(.+)/([^/]+)$"#m in
  fun label value =>
    if builtins.isStr value then
      if strings.isMatch value pattern then
        value
      else
        contracts.blame (contracts.tag "invalid path" label)
    else
      contracts.blame (contracts.tag "not a string" label) in

let type SharedObjectPath = Str | doc "A path to a shared object .so"
                                | #PathContract
                                | #SharedObjectFile
in

let Contract = {
  target : SharedObjectFile | doc "the target"
                            | default = "/lib/x86_64-linux-gnu/libc.so",
  test : Bool
} in

{
  test = false
} | #Contract

This way, a user-defined type would be just an alias for a static type applied to some contracts (or not), no need to reinvent the wheel, just taking profit from the "power" of contracts. Does that makes sense to you or were you having a different vision of this feature ?

Note: It doesn't work yet, I just propose a syntax / grammar here

yannham commented 2 years ago
let type SharedObjectPath = Str | doc "A path to a shared object .so"
                                | #PathContract
                                | #SharedObjectFile

I'm not sure to understand the semantics of this. What does it means to apply a contract to type? I don't think let type should have any meta annotations (beside doc, maybe). It should just be a let-binding defining a type alias, as in the original motivating example given by @regnat. There is no contract involved, and for that matter, IIUC, the requested feature is about static typing, and isn't really related to contract at all. That's not entirely true: because type annotations give rise to contract at runtime in Nickel, and because you also may use a type alias in a contract annotation, it does impact the implementation, but what I meant is that this feature makes sense even in your regular statically typed language without contracts nor gradual typing.

litchipi commented 2 years ago

It's not about applying a contract to a type, it's about telling that any field of type ShareObjectPath would have to be a Str satisfying the contracts PathContact and SharedObjectFile.

litchipi commented 2 years ago

Let me explain a bit why contracts over types, the original example still stands, but is a bit less elegant:

let type t = { foo: Num } in
[ { foo = 1 }, { foo = 2 }] : t

would become

let TestContract = { foo: Num }
let type t = Record #TestContract
[ { foo = 1}, { foot=2 }] : t

However that allows using contracts to craft user aliases and abstractions, to allow the user to have a type gathering all the checks he wants on it, and all the data he wants in it. I feel that also increase the overall readability.

I thought that would fit into the original intent of the feature request, allowing data checking using simple type annotations with user-defined types

yannham commented 2 years ago

It's not about applying a contract to a type, it's about telling that any field of type ShareObjectPath would have to be a Str satisfying the contracts PathContact and SharedObjectFile.

Yes, but the syntax let x = value | Foo means "apply the contract Foo to value". Thus it is strange to use the same syntax to means something quite different here, which is basically an intersection type (SharedObjectPath is both Str and #PathContract and #SharedObjectFile).

Intersections are completely out of the scope of this issue, because the original problem faced is not to have an alias for several contract at once. For those issues, relevant discussions are #420 and more recently #461. If we just want to give an alias to "several contracts", we could allow users to just define value with metadata but without content, like:

let SharedObjectPath = | Str
                       | #PathContract                
                       | #SharedObjectFile   

But it's a different matter. Once again, here, we just want to give a name to a type expression. Not meta-data, no contracts involved, just an alias to reduce verbosity.

Second, how do you typecheck this?

let Contract = {
  target : SharedObjectFile | doc "the target"
                            | default = "/lib/x86_64-linux-gnu/libc.so",
  test : Bool
}

This example doesn't really makes sense, because we don't know how to typecheck this currently. Maybe you meant to use the | instead of :? But then we are in contract land again, not types.

What we want is rather:

// Original case: verbose and annoying (beware, I haven't checked types are correct, it's not really important)
let myLib = {
  pack : forall a. a -> (forall b. (forall c. c -> b) -> b) = fun x f => f x,
  unpack : forall a. (forall b. (forall c. c -> b) -> b) -> (forall c. c -> a) -> a = fun exist f => exist f,
  // and so on...
} in ...

// With type aliases
let type existential = forall a. (forall b. b -> a) -> a in
let myLib = {
  let pack : forall a. a -> existential = ...,
  unpack : forall a. existential -> (forall b. b -> a) -> a = ...,
  // and so on
}
litchipi commented 2 years ago

Okay so in other words, what I was talking about is already possible and implemented. What you need is a syntactic sugar for re-used type annotations, and handling these type-annotations reuse in the type-checker ?

yannham commented 2 years ago

Okay so in other words, what I was talking about is already possible and implemented.

My example above is not supported today, so not really. It could make sense to have in some form, someday. But the point is that it is a different matter than this issue, which is about type aliases.

What you need is a syntactic sugar for re-used type annotations, and handling these type-annotations reuse in the type-checker ?

This, plus also handling type aliases for contract generation, which means you need to keep those type aliases around at runtime. An alternative would be to substitute type aliases and erase the mat program transformation time. But we already handle a bunch of different delayed substitutions as environments already, so I think it would make sense to do the same for type aliases, and keep them around at runtime.

litchipi commented 2 years ago

I have a problem dealing with a line in the lalrpop grammar:

CheckUnbound<Rule>: Types = <l: @L> <t: Rule> <r: @R> =>? check_unbound(&t, mk_span(src_id, l, r)).map_err(|e| lalrpop_util::ParseError::User{error: e}).and(Ok(t));

It checks for unbound type variables, which is what user-defined types are. However I cannot pass the information "This type is bound elsewhere" to the function check_unbound as there's no environment variable. Because of this, I cannot define Types that are not BaseTypes in the grammar.

Is type-bonding something really meant to be ran at a grammar level ? Why not moving it from the grammar to the type checking section (somewhere in the type_check_ function maybe ?) Another fix would be to add a grammar rule to handle user-defined alias that would get around the CheckUnbound rule.

yannham commented 2 years ago

It's done at parsing time for the following reasons:

  1. It needs to be done for both typechecking and contract generation (transformations). That is, it must be done in any case, and BEFORE program transformations (because typechecking happens before).
  2. Because we can't do it at program transformation time, remaining possibilities are during typechecking, parsing, or a new custom pass. The problem of typechecking is that we will eventually have a flag to disable typechecking for performance reasons, so we can't rely on typechecking being always performed. A custom pass would need to traverse the whole AST just for this, which is highly overkill. Which lets us parsing time, where we can check this alongside constructing the AST, with limited additional cost.

However, it is indeed more involved when some type variable may be bound by let type. We could bubble up free type variables until the top level, and only then raise an error, but that will clutter the parser code a lot.

Otherwise, we can do as we do for standard variables: only check at usage points (in typechecking and in contracts elaboration) that all type variables are indeed bound. For typechecked terms the checks will be duplicated, but that is not a terribly costly check to do. This is already what we do for normal variables.

yannham commented 1 year ago

A little update on this old issue: we're one step away from implementing this in a straightforward way, given the infrastructure put in place for checking for contract equality (#834 and follow-ups) and the soon to be merged #1442. We don't event need new syntax, the idea being that in let x = .. in foo : x, we can affect a static type interpretation to x under some conditions. Typically, if x is directly a type definition like in the example (a record type, an arrow type, etc.), this would work. I think this fit well in the approach of Nickel around merging the syntax for types and terms.

On the other hand, something that you might want to do is to export types as a part of a library. I believe this will prove harder, because what I propose would be slightly fragile if you add records, record accesses and indirection: at some point, x might not be interpreted as a type anymore because some heuristic decided it had enough and stops there. So, user-defined type might need a dedicated construct in the future.

Still, in the meantime, I don't think trying to use existing information to extract a static type from a value would hurt. Worst case, we later get a more principled construct such as type Foo =, but the two can coexist peacefully.

thufschmitt commented 2 months ago

@yannham: coming back to this after quite some time, I'm a bit confused by your last comment. As I understand it, something like let t = Number in 1 : t is expected to typecheck, but it doesn't:

nickel> let t = Number in 1 : t

error: incompatible types
  ┌─ <repl-input-6>:1:19
  │
1 │ let t = Number in 1 : t
  │                   ^ this expression
  │
  = Expected an expression of type `t` (a contract)
  = Found an expression of type `Number`
  = Static types and contracts are not compatible

What did I get wrond there?

yannham commented 2 months ago

Sorry if that wasn't clear, but my last comment implied that we're one step away from having this to work, because everything is in place. However, I think further discussions unveiled that this might not be a great idea to do that silently. There is always this tension between having something magic and automatic, trying to be smart, but which then breaks unexpectedly when the typechecker has reached the limit of its smartness. That is, this is ergonomic, but no very reliable (I'm talking about keeping the exact same syntax but just having the typechecker trying to "unfold" variables as static types)

Although I don't like to introduce a myriad of new keywords, today I tend to prefer the idea of making it explicit that something is to be considered as a type at typechecking time with this type qualifier. So let type foo = ... would have exactly the same runtime semantics as let foo = ..., but the difference would be to say to the typechecker: hey, this is supposed to be a static type, so do whatever substitution is needed. With clear rules for what qualifies as a type, and that would be checked at parse time (or typechecking time, but my point is, those should be mostly simple syntactic rules). Those simple rules should guarantee that the definition can be considered as a static type in meaningful way.

I believe this is straightforward to implement and backward compatible (this doesn't require type to be an actual reserved keyword). The only thing preventing this to happen is that I feel we don't have a good story for exporting those types. I guess if you expose a function using such a type alias, you would like to make that alias available as well (say, in the standard library).

Thanks for putting this back on our radar. It's been a long standing issue, and would surely be a very useful addition. I'll try to move it forward.