tweag / nickel

Better configuration for less
https://nickel-lang.org/
MIT License
2.24k stars 84 forks source link

Proposal: freeze recursive records upon insert/remove/update/map #1877

Open yannham opened 3 months ago

yannham commented 3 months ago

Motivation

As stated in #1866, the current semantics of record operations (insertion, removal, update and map) with respect to merging is confusing, and in fact accidental.

The initial design guideline was that record operations shouldn't interact with merging, and shouldn't trigger recursive recomputations (as overriding through merging does): you either use the merging system to get those, but when you use dict-like operations, then the record is considered as a normal dictionary.

However, this guideline is a bit vague, as noticed in #1866. As a concrete example, when removing a field which has reverse dependencies, should we freeze the reverse dependencies only or the whole record? That is, what should be the result of (std.record.remove "to_remove" {foo = bar + 1, bar = 0, to_remove = false, rev_dep = to_remove}) & {bar | force = 1, to_remove = true}?

Freezing the whole record would give {foo = 1, bar = 1, to_remove = true, rev_dep = false}. Freezing only reverse dependencies gives {foo = 2, bar = 1, to_remove = true, rev_dep = false}. If record.remove would preserve recursivity entirely, we would get {foo = 2, bar = 1, to_remove = true, rev_dep = true}.

Proposals comparison

Only freeze reverse deps

The argument for freezing only the reverse dependencies is the following: try to preserve the recursive structure of the record as much as possible (that is, as far as it's reasonable). When removing a field, we need to at least freeze the reverse dependencies, but the ones that don't interact with that field could very well stay recursive and overridable. This is more expressive, as even after a dictionary operation, one can still apply merge operations.

Freeze the whole record

The argument for freezing the whole record is simplicity and consistency. It seems that in Nickel, a single object - records - is used to represent several things: (record) contracts, partial configurations (modules) and classical key-value dictionaries. Merging makes sense for contracts and partial configurations, but not much for dictionaries (one can just remove, insert and update there). There is thus the idea that those two different "modes" shouldn't be mixed: one can use the merging system to build up a record value, but once you start to use dictionary functions, it should be considered as a static dictionary and not as a recursive structure anymore.

The motivation isn't only philosophical: while merging can already makes provenance tracking a bit more challenging, interleaving merging and dictionaries operations, where some dictionary operations might freeze some fields but not others can make it very hard to predict what happens when we override a value on the final result and how overrides propagates through a structure with potential "broken" links.

Proposal

We propose to freeze the whole record. The rationale is that this seems like the more restrictive but more principled approach to keep the complexity of provenance tracking under control. This is also the simpler mental model for users and thus the more predictable one.

That is, prior to any record operations among insert, remove, map and update:

  1. All the lazy contracts are applied to the values, and cleaned from the lazy contract fields
  2. The value of recursive fields is fixed at this point, that is, future merges won't trigger the recomputation of any field (beside the one being added or overridden)

This semantics will be independently implemented as a record.freeze (the name is subject to bikeshedding) stdlib function as well.

Challenges

It can happen that merging configurations is followed by a post-processing step which for example replaces enum variants with a proper serializable representation. Freezing the whole record could be a problem here, because it makes it impossible to override after-the-fact, for example from the CLI (customize mode).

However, we think that the right solution is to make it possible to perform such tasks (post-processing or custom serializing) within the merging system itself, rather than tweaking dict operations to do so. Typically, a proposal like #1170 would make post-processing part of the metadata system.

One might also want to override a value without having to know about its current priority, while still keeping all the other neat features of merging (keep contracts around and applying them lazily, recursive overriding, etc.). Same conclusion: this use-case mandates something like an asymmetric variant of the merge operator, whose only difference would be that one side would be selected independently from the priorities (and the priority of the other side is preserved), rather than trying to make this fit in a bloated record.update that would preserve recursive records.

In the end, note that if recursivity-preserving variants are really needed, we can always provide those under a different namespace in the stdlib - think record.fix.{insert,remove,update,map}. The question is really what should be the default behavior of those functions.