Accessing late-bind field fails

Describe the bug I apologize for the over-generic & cryptic issue title, as it is really hard to summarize the issue in words, so here's the MRE.

To Reproduce

The program looks like this:

# main.ncl
{
  FeatureFile = {
    inputs = {
      a | Number,
      b = a + 2,
    },
    bar,
    baz,
    zab | optional,
    cux = inputs.b * 10
  },
  Feature = {
    inputs = {
      a | Number,
    },
    inputs' = let a = inputs.a in {inputs.a = a},
    files | Array (FeatureFile & inputs')
  },
  feats = {
    x | Feature = 
    {
      files = [
        { 
          bar = 2, 
          baz = 3, 
          zab = x.inputs.a + 10,
        }
      ]
    }
  },
  prof1 = {
    inputs = {
      a,
    },
    features | Array Feature 
    = let a = inputs.a in [
      feats.x & {inputs.a = a}
    ]
  },
  prof1_ = prof1 & {inputs.a = 5},
}

Evaluating the program, providing inputs to the prof1 field, we get the following error:

> nickel eval main.ncl --field=prof1_

error: missing definition for `a`
   ┌─ /tmp/main.ncl:26:26
   │  
13 │       inputs = {
   │ ╭──────────────'
14 │ │       a | Number,
15 │ │     },
   │ ╰─────' in this record
   · │
26 │             zab = x.inputs.a + 10,
   │                   ---------^
   │                   │        │
   │                   │        required here
   │                   accessed here

note: 
   ┌─ /tmp/main.ncl:14:11
   │
14 │       a | Number,
   │           ^^^^^^ bound here

Commenting out line 26, prof1_ evaluates as intended:

{
  features
    | Array Feature
    = [
        {
          files
            | Array (FeatureFile & inputs')
            = [
                {
                  bar = 2,
                  baz = 3,
                  cux = 70,
                  inputs = { a | Number = 5, b = 7, },
                }
              ],
          inputs = { a | Number = 5, },
          "inputs'" = { inputs = { a = 5, }, },
        }
      ],
  inputs = { a = 5, },
}

Expected behavior

prof1_ evaluates to:

{
  features
    | Array Feature
    = [
        {
          files
            | Array (FeatureFile & inputs')
            = [
                {
                  bar = 2,
                  baz = 3,
                  cux = 70,
                  zab = 15,                                # <- `zab` gets the input value correctly
                  inputs = { a | Number = 5, b = 7, },
                }
              ],
          inputs = { a | Number = 5, },
          "inputs'" = { inputs = { a = 5, }, },
        }
      ],
  inputs = { a = 5, },
}

It feels like zab should be able to access the defined inputs.a.

Environment

OS name + version: NixOS 23.11
Version of the code: nickel-lang-cli nickel 1.6.0 (rev 3441781)

Additional context Add any other context about the problem here.

The issue is the following:

In the definition of x, you rely on the fact that x is a field of a recursive record and can thus

Refer to itself recursively
Be lazily merged later on and recomputed from there

Which is fair. However, line 37, you do:

feats.x & {inputs.a = a}

This is projecting the x field out of its original record. When a field from a recursive record is accessed, it must be computed with the values of its siblings known at the time. That is, in some sense, you "freeze" x at a point where it doesn't have any inputs field (at least regarding to its siblings - of course, if x is itself a recursive record with intra-dependencies, those can still be merged and updated without issues).

Merging it with subsequent data won't cause the field to recompute, because Nickel considers that x has a dependency over itself, and this information is lost when accessing the field - as there isn't even a notion of an x anymore outside of the original record - and so subsequent merge won't cause the value of files to be recomputed.

Maybe it looks like it could work here because x is self-referential, but for example, imagine that instead of depending on itself, x would depend on a sibling field y: once you access the body of x, there is no notion of y anymore, and thus its value can never be updated by a merge (think of {x = y + 1, y}.x). Either y is known when accessing the field and its value is used, or you'll have a missing field error.

Now, I can see two ways of fixing your issue and get the result you want: the first one is simply to perform the merge before extracting the value, so that x is still a field in a recursive record and the recomputation can trigger. That is, change line 37 by:

(feats & {x.inputs.a = a}).x

Here, we are overriding x on the right (or at least merging it with something new), and x depends on itself, so merge will recompute its value with the new data available.

Another solution is to avoid relying on x recursively, but make the actual dependency more fine-grained: in fact, the point is that you expect inputs to be a sibling field of files, and files depends on this sibling. Going through x is just an indirect way to reference inputs without having Nickel complaining about the fact that inputs isn't defined anywhere.

What you can do is to just declare some fields without definition, which work a bit like function parameters: it's just a way to say "I expect those fields to be there at some point thanks to merging" and to satisfy Nickel's scoping rules. You can then use inputs directly without going through x. To do so, rewrite feats as:

feats = {
    x | Feature =
    {
      inputs,

      files = [
        { 
          bar = 2, 
          baz = 3, 
          zab = inputs.a + 10,
        }
      ]
    }
  },

(you could in fact even declare an undefined field inputs.a instead of inputs, but it's not very useful here, as the Feature contract already ensures that an a is present). If you do that, you don't have to fix the merge site, and can keep the rest of the code identical.

Thank you @yannham . The clarity of your answer is enjoyable as always.

What you can do is to just declare some fields without definition, which work a bit like function parameters: it's just a way to say "I expect those fields to be there at some point thanks to merging" and to satisfy Nickel's scoping rules. You can then use inputs directly without going through x.

Somehow I have the wrong mental model that the contract Feature could work like a (OOP) class constructor, that attaching a contract to a record (acting as arguments to the constructor function) resembles constructing a class instance, so that declaring inputs in the contract (and applying the contract) is enough for Nickel to know that "the field will be there at some point thanks to merging".

A follow up question: Is there some improvements that I could make to simplify the wiring of passing inputs down the record hierarchy? Especially on Line 16 where the inputs is merged into a contract doesn't feel very natural.

{
  FeatureFile = {
    inputs = {                 <- declaration
      a | Number,
      b = a + 2,
    },
    bar,
    baz,
    zab | optional,
    cux = inputs.b * 10
  },
  Feature = {
    inputs = {                     <- declaration
      a | Number,
    },
    inputs' = let a = inputs.a in {inputs.a = a},
    files | Array (FeatureFile & inputs')          <- definition by merging
  },
  feats = {
    x | Feature = 
    {
      inputs,      <- declaration
      files = [
        { 
          bar = 2, 
          baz = 3, 
          zab = inputs.a + 10,
        }
      ]
    }
  },
  prof1 = {
    inputs = {          <- declaration
      a,
    },
    features | Array Feature 
    = let a = inputs.a in [
      feats.x & {inputs.a = a}    <- definition by merging
    ]
  },
  prof1_ = prof1 & {inputs.a = 5},          <- definition by merging
}

The current setup reminds me of writing React web app in the early days where passing props down the component tree layer by layer is the only way for a child node to use a value higher up in the tree hierarchy.

Somehow I have the wrong mental model that the contract Feature could work like a (OOP) class constructor, that attaching a contract to a record (acting as arguments to the constructor function) resembles constructing a class instance, so that declaring inputs in the contract (and applying the contract) is enough for Nickel to know that "the field will be there at some point thanks to merging".

It's true to some extent that we could add some rules that says that contract application might bring new variable into scope. Though OOP and contracts are still quite different.

On issue in Nickel is that we don't have different kinds of "record". If you think about it, in general purpose languages, there are different instances of "a bunch of things bundled together with name": classes definitions, interfaces, records, record types, modules, etc. They have many variations, but usually what's important in e.g. a module interface is that you can statically (and obviously) deduce which symbols are present.

Thus, you can have things like import (or open in OCaml) which brings symbols into scope. It stretches a bit the notion of lexical scoping, but it's still ok, as you can easily derive which symbols are brought into scope statically, both for the interpreter and for the humans reading the code.

In Nickel records are potentially dynamic, so if I write let MyContract = if some_complex_condition then {foo | Number} else {bar | String}, which symbols do you bring in scope? This notion of "trying to extract static information from a potentially dynamic Nickel data structure" has in fact come up again and again in various design questions (we discussed that recently again for the let type issue #422 ).

We can come up with reasonable rules, which are informally extracting obvious static information and would stop as soon as something can't be obviously determined statically. So the previous MyContract wouldn't bring anything into scope, for example. However, a simple contract like MyContract = {foo | String, bar | Number} would bring foo and bar into scope.

The issue with those smart guesses is that it can be very surprising for users once it suddenly stops working. Everything seems to work as it should, magically, and then one change to a record contract hits the limits of the rule, and you get unbound identifiers all around. I think that's why we have been reluctant to apply those kind of rules until now. Another reason is that, if we apply them, they should be properly formalized - this might actually come to fruition for the let type idea.

Another solution is to do like other languages and have restricted form of records that can't be too dynamic: modules, contracts, etc. However, the cost is complexity - especially for the occasional editor of a Nickel config who then have to understand the differences between all of those.

A follow up question: Is there some improvements that I could make to simplify the wiring of passing inputs down the record hierarchy? Especially on Line 16 where the inputs is merged into a contract doesn't feel very natural.

I understand it's a MRE, but it's a bit hard to know what you're trying to achieve on this specific example. I think one of our assumption with those lazy inputs is that you wouldn't need each and every layer to have its own separate inputs, and they could all share the one coming from the top-level record. So same question here: do you really need each subrecord to have its own separate inputs? Could you hoist them at the top-level and let each contract and value share those inputs?

Totally agree that any half-baked, informal, "smartish" rules makes for worse UX, because when they fail, we as human have a tendency to think of it as "trying to be smart" but really "is just dumb", while not recognizing it being smart when it does the "smart" things correctly.

Though I'm not following on the relation between the topic of this issue i.e. "a contract bringing variable into scope" and the let type idea. I thought that was about naming /reusing a static type definition, which is very well supported for contracts.

I understand it's a MRE, but it's a bit hard to know what you're trying to achieve on this specific example. I think one of our assumption with those lazy inputs is that you wouldn't need each and every layer to have its own separate inputs, and they could all share the one coming from the top-level record. So same question here: do you really need each subrecord to have its own separate inputs? Could you hoist them at the top-level and let each contract and value share those inputs?

I'm experimenting with modularization. By making each level a "module" with its own inputs (as the new "Modular Configuration" doc section suggests), they become "self-contained" thus composable with other modules. My experiment defines contracts for each module, then wires the inputs together when composing smaller modules into larger ones. When that is done, the final configuration can be splitted (by domain logic) into smaller files (also modules) applied with their respective contracts.

After I corrected my codebase based on your suggestions, I realized the problem with my previous MRE: the prof1 field was not attached by a Prof contract where the inputs wiring should be done. As a matter of fact, when things are done consistently, the modularization turned out to work better than I expected: wiring is only needed on the contracts side, and everything just works at call site (I mean where the module configuration is actually defined).

Though I'm not following on the relation between the topic of this issue i.e. "a contract bringing variable into scope" and the let type idea. I thought that was about naming /reusing a static type definition, which is very well supported for contracts.

Yes, the connection isn't all too obvious. The conceptual issue we encounter with let type is when you want to put a type definition in a record, for example in a module of the stdlib, which is surely very useful (so indeed let type as a let-binding alone doesn't have this problem, but it's very limited if you can only alias a type locally - we really want to export types as well). You could write {type Foo = [| 'Bar, 'Baz |], <...rest of module...>}. Then you could use this type, say let my_value : std.foo.Foo = ....

But the typechecker now needs to statically resolve a record access before evaluation. So you end up with the same dilemma as for deciding what bindings are brought into scope by a contract: either we need a restricted notion of a record (say, a module), which guarantees that you can easily resolve such accesses statically, or "you try to be smart" when all of std and foo are more or less record literals - maybe modulo some let-bindings. So although it's not exactly the same problem on the surface, I think it's in fact closely related - you need to decide statically the field definitions of a record in both cases (although you only need the names for the scoping problem at hand here, while you also need to find the definition for the let type resolution).

Good to know that your experiment turns out well!

Ah I see, so the similarity is the dilemma of being formal but restrictive or "being smart".

Thanks again @yannham for all the guidance and insights.

tweag / nickel

Accessing late-bind field fails #1915