Merging, contracts and "lazyness" : clarify the semantics

yannham commented 2 years ago

Over the course of experimenting with Nickel, we faced surprising behaviors involving merging and contracts. The goal of this issue is to gather such examples, and to try to reflect on how to mitigate those issues, or at least make them predictable as a simple consequence of a clear semantics for merging and contracts.

Contract propagation

The idea of merging is mainly inspired from two models, the NixOS module system and the CUE programming language. Both are fundamentally based on the idea of composing small logical data blocks into a final result. Those blocks are not necessarily independent: a value defined in a block may depend on other values that will only be available later in the big merged blob (for NixOS modules, those are typically options). In Nickel terms, this is often referred to as recursive overriding.

This capability is what makes merging powerful, but comes with its lot of questions. What should be the result of the following program?

{
  foo | Num
      | default = 5,
  bar = foo,
} & { foo = "a" }

If anything, this expressions should evaluate to {foo = "a", bar = "a"}. However, the fact that foo has been annotated with a Num contract but may end up with a string in the final result without an error is disturbing. This example is artificial, but in real life the two merged blocks could be in different locations, and the reasons as to why why an "a" ends up here where a number was expected could be harder to investigate.

In CUE, the underlying model guides the answer to this question. Types and values are not distinct entities (roughly, values can be seen as types with only one inhabitant, i.e. singletons), but live all in the same space. Merging intuitively correspond to intersection. In that interpretation, trying to merge "a" with something of type Num will throw an error. Similarly, in NixOS modules, a configuration option of type string (Nix doesn't have types per se, but the NixOS module system library has a notion of type, similar in spirit to Nickel's contracts) won't be merged with a number without an error.

Thus, it seems reasonable to do the same in Nickel. Let us write a first specification that we would like our system to enjoy:

Lazy propagation $(LP)$ : let e be a merge expression x_1 & ... & x_n, where x_is evaluate to records. Then, if there exists j such that x_j has a field f with a contract C attached, then e.f | C (that is, e.f respects the contract C)

Laziness

Nickel does respect $(LP)$. Actually, it does something stricter in the sense that it blames more expression than what would be strictly required by $(LP)$. The current implementation immediatly "cross-applies" contracts of both sides:

Cross application $(CA)$: let e be a merge expression x & y, where x and y evaluate to records. Let f be a field, A1, ..., An be the contracts attached to f in x, and B1, ..., Bm the ones attached to f in y, then

e.f = (x.f | B1 | ... | Bm) & (y.f | A1 | ... | An)

(with the obvious extensions to edge cases where f is not defined in x or y, or where either n or m is zero)

Note that in the current implementation, A1, ..., An have already been aplied to the value of x.f. So, if we repeat all the contract applications that ends up being evaluated, we get:

(x.f' | A1 | ... | An | B1 | ... | Bm) & (y.f' | A1 | ... | An | B1 | ... | Bm)

Where x.f' and y.f' are the original definitions for f, before the contract application transformation phase.

Cross application edge cases

However, this choice has a number of unintuitive consequences. The difference between $(CA)$ and $(LP)$ is that $(CA)$ may require intermediate values (that are built up during merging) to respect all the contracts attached to a field, while $(LP)$ only require that the final value respect all the contracts. Let us give examples of how this manifest in practice.

Record contracts

Due to how record contracts work, they don't suffer this over-eagerness of $(CA)$. When applied, a record contract checks immediately that there is no additional field. But missing fields are only checked once they are accessed. Take the following example:

{
  foo | {
    bar | Num,
    baz | Str
  }
}
& {foo = {}}
& {foo.bar = 1}
& {foo.baz = "a"}

Although the intermediate values {}, {foo.bar = 1}, and {foo.baz = "a"} are missing some fields of the initial contract, this example works fine, because as these intermediate values are not observed, no blame arises. If there is an extra field in one intermediate value, it will also be present in the final value, so it's fine to check and fail eagerly on extra fields.

But this is somehow a special treatment made specifically to make record contracts to interact well with merging. Forgetting merging, it would have been also sensible that record contracts check for both missing fields and extra fields eagerly, which would break this example. And in fact, contracts derived from record types are, on the other hand, eager! So, the companion program with : instead of | makes the program fail:

nickel>{
  foo | {
    bar : Num,
    baz : Str
  }
}
& {foo = {}}
& {foo.bar = 1}
& {foo.baz = "a"}

error: contract broken by a value: missing field `bar`
  ┌─ :1:1
  │
1 │ {bar: Num, baz: Str}
  │ -------------------- expected type
  │
  ┌─ repl-input-8:7:10
  │
7 │ & {foo = {}}
  │          -- evaluated to this expression
  │
  ┌─ <unknown> (generated by evaluation):1:1
  │
1 │ { ... }
  │ ------- evaluated to this value

note: 
  ┌─ repl-input-8:2:9
  │  
2 │     foo | {
  │ ╭─────────^
3 │ │     bar : Num,
4 │ │     baz : Str
5 │ │   }
  │ ╰───^ bound here

One could argue that the problem is that record contracts and contracts derived from record types should behave the same (and indeed, this difference is a kinda after-the-fact accident). But this shows again that checking for missing fields eagerly is a reasonable choice (it was chosen at the time contracts for record type were implemented, probably before merging landed), and that this choice doesn't play well with merging and $(CP)$. Which does add to the argument that the current design is fragile.

Dictionary

Another eager contract is the dictionary contract {_ : T}. The following example (simplified) has been naturally written while brainstorming on how a Nix package would look like in Nickel:

nickel>let Drv = { out_path | Str, ..} in
let Package = { name | Str, drv | Drv, .. } in
{
  build_inputs | {_: Package} = {
    foo,
    bar,
  },
  build = m%"
    %{build_inputs.foo.drv.out_path}/bin/foo $out
  "%m,
} & {
  build_inputs = {
    foo = { name = "foo", drv.out_path = "/fake/path" },
    bar = { name = "bar", drv.out_path = "/fake/path" },
  }
}

error: missing definition for `bar`

Surely, bar is not missing a definition in the final combined record. The issue is that the implementation of the {_: T} contract maps T onto the fields of its argument. This operation is eager, in that if one field doesn't have a definition right away, this operation fails. Because we do $(CA)$, the {_ : Package} contract is applied independently to both {foo, bar} and {foo = ..., bar = ...}. The former, which serves as an interface, is missing definitions and fails the contract no matter what it will be merged with.

For this particular case, we could make the {_: T} contract map lazily, but that would require a new dedicated operator. And this sounds like yet another special casing: fact is, similar variant of record contracts could very well be user-defined, and we can't special case them all.

Implementing (LP) in a more lax way

Instead of doing our eager $(CP)$, we could instead put the cross application of contracts in outer position, so to speak. That is, (v1 | A) & (v2 | B) would become (v1 & v2) | A | B, instead of the current (v1 | A | B) & (v2 | B | A) (which also saves us contract applications en passant). Making contracts more lazy accepts more programs, some of them looking questionable at first:

{
  foo | Num | default = "a" ++ "b",
} & { foo = 5 }

This programs is currently blamed, because "a" ++ "b" doesn't respect the contract Num. But if we implement precisely $(LP)$, the program runs fine, because "ab" is not observable, and thus not checked. This may be a bit surprising. That being said, there is already a lot of laziness in Nickel, and this "values don't blame if they are not observed/used" is actually already the dominating spirit in the contract system. This consequence doesn't strike as being shocking to me.

Outer cross-applications

Another source of strange behavior is when merging records with contracts attached (the contracts aren't attached to a field, but to the whole record). For example, what is the result of the following program?

({foo = 5} | {foo | Num})
& {bar = "bar"}

We merge two records, and one is additionally checked by a contract. Everything looks fine, and we expect this program to evaluate to {foo = 5, bar = "bar"}:

nickel>({foo = 5} | {foo | Num})
& {bar = "bar"}

error: contract broken by a value: extra field `bar`
  ┌─ :1:1
  │
1 │ { ... }
  │ ------- expected type
  │
  ┌─ repl-input-1:2:3
  │
2 │ & {bar = "bar"}
  │   ^^^^^^^^^^^^^ applied to this expression
  │
  ┌─ <unknown> (generated by evaluation):1:1
  │
1 │ { ... }
  │ ------- evaluated to this value

note: 
  ┌─ repl-input-1:1:14
  │
1 │ ({foo = 5} | {foo | Num})
  │              ^^^^^^^^^^^ bound here

Ouch. What happens is that our $(CA)$ approach (actually any $(LP)$-compliant implementation could exhibit this behavior) applies to top-level merge expressions, not only to record fields. In consequence, the {foo | Num} contract is cross-applied to {bar = "bar"} value.

Indeed, we only have one merge operator, and we defined quite straightforwardly the merging of records to be recursive: {foo = exp1, ...} & {foo = exp2, ...} evaluates to {foo = exp1 & exp2, ...}. Thus, we currently can't really make a difference between "I am merging fields of record as a result of a prior merging of those records" from "I am merging two top-level values", and $(CA)$ is applied in both cases.

Note that this is not a direct consequence of $(LP)$ as defined in this issue, which explicitly only applies to sub-fields of the original value. The issues appears because we have a simple recursive definition of merging for record fields. We may need to differentiate between a top-level merging and an "inherited" merging &sub, such that {foo = exp1} & {foo = exp2} is {foo = exp1 &sub exp2}, and where &sub may have a different behavior with respect to contract applications.

A bird's-eye view

Taking a step back, I think that there are two different - and somehow competing - interpretations for the merging operator.

The first is as a binary operator with a simple recursive definition. This one entails a local definition: the contribution of a & b in a & b & c doesn't depend on c.
The second is rather in line with the NixOS module view, and the one that guided the RFC002 on overriding, which is that merging values builds up a symbolic tree, and when a final value needs to be extracted, this tree is evaluated depending on the various annotations (metadata) found inside. This one is by nature not local, with respect to the a & b & c example above.

Maybe we should have two flavor of merging in consequence, the one written by the user being the local version, and the one generated for merging subfields (&sub) being the tree interpretation, as mentioned in Outer cross-applications. Or we may want only one version, but in any case, we need a clear and well-rounded semantics for those edge cases.

aspiwack commented 2 years ago

I haven't thought about this much. Kind of obviously I suppose: I've only just read the issue. But there are other natural ways to address these problems that came up while I was reading (which actually may be wrong: I really haven't had time to give it much thought at all).

Solution 1

Maybe the closed-record contracts are not as useful as we thought they were. And their interaction with merging i kind of icky, so we could just remove them. I think that I am at least correct in asserting that all of these issues emerge as the interaction of merging and closed-record contracts. Am I?

Solution 2

Another option that I've thought of is to make { foo = 1 } | { foo | Num, bar | String } equal { foo | Num = 1, bar | String }.

That is, missing fields do not cause an immediate error, simply adds the missing fields as required (you may end up not erroring out on the missing field if you never end up calling to it, but it's pretty par for the course in contract-land). This makes the semantics of record contract closer to merging.

bew commented 2 years ago

@aspiwack From an untrained eye (but being used to the lazyness of Nix), your solution 2 feels the most intuitive

yannham commented 2 years ago

Maybe the closed-record contracts are not as useful as we thought they were. And their interaction with merging i kind of icky, so we could just remove them. I think that I am at least correct in asserting that all of these issues emerge as the interaction of merging and closed-record contracts. Am I?

I would argue that closed record contracts are essential: schema are often closed by default (they represent all the possible fields that you can use), and being unable to detect an extra field in a config sound very limiting. Say, there is an optional field is_foo_bar, but you wrote is_foo_bra: because it is optional, it won't raise a missing definition, and without closed contracts, we couldn't detect this unexpected field.

Rereading the issue, there's a lot to digest, and the two cases I expose deserve a summary. I will edit the text. But for now, I would not really say this is related to closed record contracts specifically:

The first issue is that propagation of contracts when merging looks too eager/too strict, requiring intermediate values to satisfy the final contract. The given examples don't involve closed contracts (one is a record type, but the error is missing field definition which would happen with open contracts too, and the other example is a dictionary contract).
The second issue is that this cross-application semantics/contract propagation sounds legit for fields of records being merged, but not for top-level values being merged. The example given involves a closed contract, and it's true that an open record contract wouldn't be able to trigger this, because merging cannot "unset" fields, so if a field is defined by the LHS, it will be in the final result. However, it's not specific to closed record contract, as we can give an example with dictionary types instead:
```
nickel>
({foo = 5} | {_ : Num})
& {bar = "bar"}
```

error: contract broken by a value ┌─ :1:1 │ 1 │ {_: Num} │ -------- expected type │ ┌─ repl-input-0:2:10 │ 2 │ & {bar = "bar"} │ ----- evaluated to this expression │ ┌─ (generated by evaluation):1:1 │ 1 │ "bar" │ ----- evaluated to this value

note: ┌─ repl-input-0:1:14 │ 1 │ ({foo = 5} | {_ : Num}) │ ^^^^^^^^^ bound here

yannham commented 1 year ago

Superseded/closed by RFC005 #819 (yet to be implemented).

tweag / nickel