toml-lang / toml

Tom's Obvious, Minimal Language
https://toml.io
MIT License
19.44k stars 847 forks source link

Allow keys in key-value pairs to be paths #499

Closed pradyunsg closed 5 years ago

pradyunsg commented 6 years ago

The only remaining idea from #292 that has not been decided upon and does not have a dedicated issue.

I mean, I don't know how much I like it myself but, hey, this needs discussion so, here's a dedicated issue for it.

[document]
title = "Hello!"
meta.charset = "utf-8"
lmna commented 6 years ago

Compare (this is a slightly modified example from the spec):

[[catalogue."Cash & Carry".fruit]]
  name = "apple"

  [catalogue."Cash & Carry".fruit.physical]
    color = "red"
    shape = "round"

  [[catalogue."Cash & Carry".fruit.variety]]
    name = "red delicious"

  [[catalogue."Cash & Carry".fruit.variety]]
    name = "granny smith"

[[catalogue."Cash & Carry".fruit]]
  name = "banana"

  [[catalogue."Cash & Carry".fruit.variety]]
    name = "plantain"

versus

[[catalogue."Cash & Carry".fruit]]
name = "apple"
physical.color = "red"
physical.shape = "round"
variety = [
    { name = "red delicious" },
    { name = "granny smith" },
]

[[catalogue."Cash & Carry".fruit]]
name = "banana"
variety = [
    { name = "plantain" },
]

First version is harder to read because it is cluttered with repeating (and absolutely meaningless) catalogue."Cash & Carry".fruit prefix.

I believe that proposed feature gives a huge boost in readability for complex, deeply-nested configurations.

lmna commented 6 years ago

Proposed feature enables intuitive syntax for some simple cases of array-of-tables issue #309

pradyunsg commented 6 years ago

Thanks for a nice example @lmna. Also for @dstufft's example from #413:

[a]
value = 1

[a.b]
value = 2

[a.c]
value = 3

[a.c.d]
value = 4

[a.e]
value = 5

It becomes:

[a]
value = 1
b.value = 2
c.value = 3
c.d.value = 4
e.value = 5

Much nicer! ^>^

mojombo commented 6 years ago

This could be a very nice and powerful addition to TOML. Let's go through a few ramifications to see if there are any traps.

This would allow any TOML document to be expressed without any bracket-style tables at all. The last example above could also be expressed as:

a.value = 1
a.b.value = 2
a.c.value = 3
a.c.d.value = 4
a.e.value = 5

More realistically, you'd be repeating longer key names. Perhaps something like this is better to see what that would feel like in reality:

3dprinter.extruder1.material = "PLA"
3dprinter.extruder1.temp.max = 242
3dprinter.extruder1.temp.min = 238
3dprinter.extruder1.temp.unit = "F"
3dprinter.extruder1.color = "red"
3dprinter.extruder1.feed_rate = "23"

The repetition becomes annoying in this case and it would be natural to switch to bracket tables to reduce that repetition, so I don't think that's a hit against the proposal.

To remain consistent with tables, we would need tables expressed this way to adhere to the same non-re-opening restriction. Thus, the following would be invalid:

a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3 # INVALID - reopens table [a.b]

That's easy enough to say and enforce, no different than tables already behave.

@lmna is absolutely right in that this proposal could be used to work around the confusing quirks of array table syntax and clean those up, which would be very nice because that is indeed TOMLs least elegant bit. I'm guessing most situations could be represented cleanly with thoughtful use of "path keys" and inline tables. A big win for TOML.

I can't think of any big downsides. TOML remains unambiguous, as this is simply an alternate table syntax along with regular tables and inline tables. It's quite obvious what's going on and since "." is already forbidden in keys, would be backwards compatible with 0.4.0.

Perhaps one could argue that this addition would make TOML less minimal (OMG 3 ways to define tables!!!!), but it would help clean up some TOML docs that would otherwise be more verbose and less obvious, a tradeoff worth serious consideration.

Let me draw up a PR to see what this might look like in the spec/ABNF.

pradyunsg commented 6 years ago

I can't think of any big downsides.

+1

Let me draw up a PR to see what this might look like in the spec/ABNF.

Maybe #446 would come into play here?

a.key = 1
unrelated-table.key = 1
a.b.key = 1

If the above is invalid, which it is IMO, so should it's table equivalent.

pradyunsg commented 6 years ago

Aside, https://github.com/pradyunsg/toml/tree/dotted-keys. :)

mojombo commented 6 years ago

@pradyunsg Ah, excellent, please submit as a PR, I didn't start on one yet.

alexcrichton commented 6 years ago

This is a pretty neat idea! It may be helpful to take a look at where existing projects may use this to see what the impact could be perhaps? I'm personally most familiar with Cargo, so I'll stick with that :)

The first thing that comes to mind for Cargo is the [dependencies] section:

[dependencies]
libc = "0.2"
serde = { git = "https://github.com/serde-rs/serde" }
my-crate = { path = "path/to/my-crate", version = "0.2" }

Today I (and I think a number of others) like how dependencies tend to be easily scannable top to bottom, one line each. With this extension I could imagine some people may switch idioms to maybe do something (pessimistically) like:

[dependencies]
libc = "0.2"
serde.git = "https://github.com/serde-rs/serde"
my-crate.path = "path/to/my-crate" 
my-crate.version = "0.2"

Readability-wise I think that unfortunately a conversion like this is a net-loss (subjectively at least). Scanning the dependency list it's not clear if "serde.git" is the name of a dependency or not, you'd have to have prior knowledge to mentally strip away after the . to know that the dependency name is "serde". Similarly for "my-crate" I think (personally) it looks a little worse as it's now spread over two lines.

Now that of course doesn't mean we shouldn't accept a change like this! This sounds very similar to the old inline tables discussion where some things can definitely get worse, yet many patterns get much better. I remember that way-back-when we basically designed the features of Cargo.toml around the syntax and features of TOML itself, and I'd suspect that most consumers of TOML would do similarly. I think that means for Cargo we wouldn't show examples and otherwise wouldn't recommend syntax like this in the [dependencies] section, and that would probably do us fine!

Now one place where I think Cargo could benefit greatly is the [profile] section:

[profile.dev]
opt-level = 1

[profile.release]
debug = true
lto = true

That I think actually looks better as:

[profile]
dev.opt-level = 1
release.debug = true
release.lto = true

So I do think there's possible areas for us to use this in Cargo!

Overall I'm 👍 on this feature, it seems like a natural extension of the [a.b.c] syntax in table headers and then, like before, the onus is on authors to leverage and recommend TOML patterns for "looking nice", which doesn't mean aggressively using or not using this, just where appropriate!

ahmedcharles commented 6 years ago

I think the biggest downside here is specifying when the table closes for modification. This proposal doesn't seem to make that clear.

For example, if we assume this is valid:

[profile]
dev.opt-level = 1
release.debug = true
release.lto = true

Is this also valid:

[profile]
release.debug = true
dev.opt-level = 1
release.lto = true

If it's valid, then why have tables close at all and if it's invalid, then how do you explain that to users effectively? The 2 current ways of specifying tables force locality when defining tables and do so in an obvious way. Exchanging key/value pairs within a table section never changes the validity of a file. In order to keep that invariant and add this functionality, you have to give up the locality of table definitions.

Note, the 'pro' side examples above could be written as:

[[catalogue."Cash & Carry".fruit]]
name = "apple"
physical = { color = "red", shape = "round" }
variety = [
    { name = "red delicious" },
    { name = "granny smith" },
]

[[catalogue."Cash & Carry".fruit]]
name = "banana"
variety = [
    { name = "plantain" },
]
[profile]
dev = { opt-level = 1 }
release = { debug = true, lto = true }

The current specification seems to allow for reasonable readability while avoiding confusion and risking adding a feature without implementation experience.

StefanKarpinski commented 6 years ago

There are two ways I can see addressing your concerns, @ahmedcharles:

  1. The profile.release table is closed when profile is closed. The general rule would be that tables written with the dotted key syntax are closed when their enclosing table that is not written with dotted key syntax closes.

  2. Require that all dotted key entries with the same prefix appear together, so the second example where dev.opt-level appears between release.debug and release.lto would be illegal. Then the profile.release table would be closed after seeing the last release. entry in the profile section.

The latter approach doesn't violate the principle that sorting a table should not affect its meaning or validity since sorting would keep dotted keys with the same prefix together. It would, however, mean that randomizing the order of key-value pairs could cause it to become illegal if it separates dotted keys with the same prefix. I'm not sure that's a problem though – I can see why being allowed to sort the keys is useful, I have a hard time seeing why randomizing the keys would be useful.

StefanKarpinski commented 6 years ago

Note, the 'pro' side examples above could be written as:

[[catalogue."Cash & Carry".fruit]]
name = "apple"
physical = { color = "red", shape = "round" }
variety = [
    { name = "red delicious" },
    { name = "granny smith" },
]

I think it's key to note that this only looks reasonable because the keys and values in the physical table are quite short. If it was this instead, the inline table is less acceptable:

physical = { color = "redredredredredredredredredredredredredredredredredredredredredred", shape = "roundroundroundroundroundroundroundroundroundroundroundroundround" }

Of course, another solution would be to allow multiline inline tables, e.g.:

physical = {
    color = "redredredredredredredredredredredredredredredredredredredredredred",
    shape = "roundroundroundroundroundroundroundroundroundroundroundroundround"
}

I'm not sure if that's preferable to what's being proposed here, however. For example, it means that you can't scan through a section looking for ^\s*\w+\s*= and be sure that you're finding a key in that table since the shape = line for example looks like that but is actually an entry in a subtable. The physical.shape = syntax doesn't have that problem.

ahmedcharles commented 6 years ago

'Sorting' was the wrong word, I meant 'exchanging'. I think the property that keys can be shuffled within a section while retaining meaning is important, not because one wants to do that but because explaining the errors caused by not doing that no longer fits the definition of being simple. Saying that you can't duplicate section headers or key names is really simple by comparison.

Additionally, the motivation for restricting inline tables to a single line is explicitly because their intended use is for small, simple tables. Larger tables benefit less from inline syntax just as they would from the proposed path syntax. I.e. you don't want related values being dispersed throughout a file, instead, they should exist in relative proximity.

The current spec has two properties:

  1. Table keys/values (which aren't tables themselves) have good locality.
  2. The table reopening restriction is easy to explain, because it simply disallows duplicated sections and keys.

This proposal forces a choice between those two properties, because you can't keep both.

pradyunsg commented 6 years ago

@mojombo this should be reopened then. =)

mojombo commented 6 years ago

Dotted keys have been merged, but we should still clarify when tables close.

lmna commented 6 years ago

you don't want related values being dispersed throughout a file, instead, they should exist in relative proximity

Yep. Related values are to be put in the same table. And any forms of "table reopening" should be forbidden.

explaining the errors caused by not doing that no longer fits the definition of being simple

Is this far from simple? - "Error. Attempt to reopen table [Foo.Bar] at line X. Table [Foo.Bar] was closed at line Y."

ahmedcharles commented 6 years ago

Is this far from simple? - "Error. Attempt to reopen table [Foo.Bar] at line X. Table [Foo.Bar] was closed at line Y."

I suppose it depends on your definition of simple. Given what TOML strives to be, yes, this is far from simple, in my opinion.

eksortso commented 6 years ago

The notion of "closing" a table applies to non-table assignments. Assigning sub- or super-tables is offered more latitude when standard table definitions are used. After all, it was in this context that this rule applies: "As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it."

But what if the tables are defined with key-path notation? Or with inline notation, which raises similar questions? In other words, are these valid?

Key-path assignments and subtables

[profile]
dev.opt-level = 1
release.debug = true
release.lto = true

[profile.release.misc]  # Is this section valid?
alpha = "A"
beta = "B"

Inline tables and subtables

[profile]
dev.opt-level = 1
release = {debug = true, lto = true}

[profile.release.misc]  # Valid? Even though `profile.release` was defined inline?
alpha = "A"
beta = "B"

Inline tables and key-path assignments

[profile]
dev.opt-level = 1
release = {debug = true, lto = true}
release.misc.alpha = "A"  # Can we define `profile.release.misc` this way?
release.misc.beta = "B"   # Is this valid?

I think all three examples ought to be considered invalid. The first one visually breaks up the set of profile.release assignments. The others gunk up one-liner definitions, which should be kept short and succinct if used at all.

In order to keep things obvious and minimal, we may insist that the definitions of subtables be restricted on these two types of table definitions. Mainly:

These two proposed rules, along with the non-reopening restriction, ought to settle the issue of when tables are "closed," and can be extended to address table arrays.

falcon71 commented 6 years ago

I find the concept of "closing" a table quite difficult to grasp. With the dotted key syntax, there are now so many different ways to navigate through tables, it makes it difficult to figure out when you are allowed to append to a table and when not.

If you want a concept of "closing", then why is this allowed?:

[a.b]
c = "a.b.c"
[a]
d = "a.d"

I feel that the concept of "a value can only be assigned once" is much easier to understand and should be sufficient. For primitives it's simple and arrays can be appended to anytime. You should be able to add new keys to a table anytime as well, as long a key has not been defined before. The [a.b] and the [a] in the previous example can be interpreted as merely specifying a path creating referenced tables implicitly if needed. Once the key a is a table, it can't be assigned another value. However, it can be referenced and expanded again.

Another point that is not clear to me is how arrays are currently supposed to be handled. The the first part in the following example appears to be currently valid. When thinking in terms of paths, any key, included a dotted key, should reference the last element of an array. All versions below would be equivalent:

[[a.b]]
[a]
x0 = "a.x0"
[a.b.c]
d = "a.b[0].c.d"

[[a.b]]
[a]
x1 = "a.x1"
b.c.d = "a.b[1].c.d"

[[a.b]]
[a]
x2 = "a.x2"
[a.b]
c.d = "a.b[2].c.d"

[[a.b]]
[a]
x3 = "a.x3"
b = { c.d = "a.b[3].c.d" }

The only surprise is, that the [[]] syntax always creates a new element in an array and does not merely specify a path. The conclusion to thinking in paths is, that the following should be valid as well:

[a]
b = "a.b"
[a.c] 
c = "a.c.c"
[a] #currently not possible
c.d = "a.c.d"
[] #currently definitely not possible
a.d = "a.d"

The "assign a value only once" rule is easy to understand, the paths work consistently in all cases and should be equally simple to implement in parsers.

eksortso commented 6 years ago

@falcon71 Let me address questions that you had in your examples. A second comment post will follow.

You asked why this was allowed.

[a.b]
c = "a.b.c"
[a]
d = "a.d"

The rules for opening and closing tables are more flexible for table and table-array values. The spec says "As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it." That's why you can write a and a.b in either order. This is valid TOML because nothing has been assigned to a yet, except for the table value a.b.

It is ugly. It needs to be sorted for legibility's sake. But it's legal.

And I ought to put in a PR to re-write the rule in the spec, because "specific" isn't specific enough.

Table arrays are confusing enough as they are. Let me comment the code in your example, because something doesn't seem right about it. Not sure if you realize that each instance of [[a.b]] defines the next element of the table array.

[[a.b]]  # Defines table array `a.b`, opens its FIRST element,...
         # ...and leaves it empty?
[a]      # Opens the table `a`, which already holds the array `a.b`
x0 = "a.x0"    # (that's right)
[a.b.c]  # Opens a new table `c` in the first element of `a.b`.
d = "a.b[0].c.d"    # (that's right)

[[a.b]]  # Opens SECOND element of table array `a.b`,...
         # ...and leaves it empty?
[a]      # INVALID AT THIS POINT. `a` was already defined above.
         # Like I said, I'll address your central point in another post.
#...

Does this example clear up how the table array a.b works?

[a]      # There's only one table `a`.
x0 = "a.x0"
x1 = "a.x1"
x2 = "a.x2"
x3 = "a.x3"

[[a.b]]  # FIRST element of table array `a.b` (index 0, from your POV)
y0 = "a.b[0].y0"
[a.b.c]  # This is `c` in FIRST element. `a.b.c` is implicitly `a.b[0].c`.
d = "a.b[0].c.d"

[[a.b]]  # SECOND element (index 1)
y1 = "a.b[1].y1"
c.d = "a.b[1].c.d"    # We're already in `a.b[1]`.

[[a.b]]  # THIRD element (index 2)
y2 = "a.b[2].y2"
c.d = "a.b[2].c.d"

[[a.b]]  # FOURTH element (index 3)
y3 = "a.b[3].y3"
c.d = "a.b[3].c.d"
eksortso commented 6 years ago

@falcon71 As much as I can appreciate a general "assign a value only once" rule, I think that it would not work in TOML.

A human-readable configuration format does require some restrictions on how flexible it can be, in order to preserve readability. Key paths were introduced for that purpose. Using them improperly could lead to unreadable files, though.

I would prefer that all non-table basic-type assignments in a table be kept in the same place. Note that we have precedent for this. Say we configure a nested table x.a like this:

[x.a]
b = 1

[x.a]  # INVALID: The table `x.a` was already defined.
c = 2

We didn't re-assign anything to x.a, but that doesn't matter. The second [x.a] is considered a re-definition of x.a. This has the nice effect of keeping all non-table values in x.a defined in one place, the standard section [x.a]. And it places no limitations on any later-defined subtables, or on the supertable x.

I previously recommended that all inline table assignments be closed to both new basic values and subtables, to keep inline tables entirely self-contained. I stand by that recommendation. Key paths and standard subtable definitions should not touch inline tables.

@mojombo's past statement implies that a table whose basic values are assigned using key-path notation must necessarily have all such assignments grouped together, even if subtables and supertables are defined elsewhere.

But I also recommended that standard table notation should not be used to add subtables to tables defined by key-path assignments. The existing rules close off new basic value additions to key-path-defined tables once they are no longer being referenced, and my recommendation closes off new subtables in the same context.

For the sake of error reporting, all of this put together implies that each table in the configuration is defined in one continuous set of lines. An error message can thus state that "Line N invalid; table x.y.z was defined in lines A-Z." The user can take this hint and transfer line N's contents in between lines A and Z inclusive. For subtable restrictions, a similar message can be provided. Parsers would need to keep track of which lines defined which tables, but each table would always be a continuous range.

falcon71 commented 6 years ago

Thank you for your answers. Yes, you are right, my proposal focused on implementation simplicity without providing any value for human users apart from obfuscation. Based on my understanding of your rules, the following would be a valid toml?:

[a.b] #closes empty, opens a.b
c = "a.b.c"

[a] #closes a.b, opens a
#b.d = "a.b.d" #invalid, a.b is already closed
c = "a.c"
b.d.e = "a.b.d.e" #closes a, opens a.b.d
b.d.f = "a.b.d.f"
#d = "a.d" #invalid, a already closed
d.e  "a.d.e" #closes a.b.d, opens a.d

#[[a.d]] #invalid closes a.d, opens it again
d.f = { g = "a.d.f.g"} #a.d.f never opens, a.d still open
#d.f.h = "a.d.f.h" #invalid, a.d.f was never open
d.e = "a.d.e"

[[b.a]] #closes a.d, opens b.a[0]
a = "b.a[0].a"
[b] # closes b.a[0], opens b
#a.c = "b.a[0].c" #invalid, b.a[0] is closed
a.c.d = "b.a[0].c.d" #closes b, opens b.a[0].c

[[b.a]] #closes b.a[0].c, opens b.a[1]

[a.x] #closes b.a[1], opens a.x
eksortso commented 6 years ago

Let me start by noting that you could have more than one table open at a time. Two tables can be open at one time if you are using dotted keys. With inline tables, you may have several tables open, if only briefly.

What I have in mind is a hierarchy of the definition styles. Sections contain bare keys, quoted keys, and groups of dotted-key-defined tables. They all can contain inline subtables for values, which may also contain dotted keys in inline subtables.

More explicitly:

This is getting very elaborate. But I think it's been an enlightening process so far, and I hope you think so too.

## Here's your original code.
## My comments are double-hashed and refer to prior lines.

[a.b] #closes empty, opens a.b
    ## Yes. The root table can only accept subtables and subtable arrays from
    ## this point forward. The section table `a.b` is opened.
c = "a.b.c"

[a] #closes a.b, opens a
    ## Yes, exactly. Subtables of `a.b` may later be defined.
#b.d = "a.b.d" #invalid, a.b is already closed
    ## That's right.
c = "a.c"
b.d.e = "a.b.d.e" #closes a, opens a.b.d
    ## No; section `[a]` keeps table `a` open.
    ## But Yes; the dotted keys open `a.b.d` here.
b.d.f = "a.b.d.f"
#d = "a.d" #invalid, a already closed
    ## No; section `[a]` keeps the table `a` open.
    ## The missing key path would have closed `a.b.d`.
    ## But since this is commented out, let's move on.
d.e  "a.d.e" #closes a.b.d, opens a.d
    ## INVALID, because you forgot the "=" sign!

#[[a.d]] #invalid closes a.d, opens it again
d.f = { g = "a.d.f.g"} #a.d.f never opens, a.d still open
    ## The dotted keys close the table `a.b.d` and open `a.d` here.
    ## The inline table value opens and closes `a.d.f` on a single line.
    ## `a` is still open for basic assignments.
#d.f.h = "a.d.f.h" #invalid, a.d.f was never open
    ## Not exactly; the table `a.d.f` is already closed.
d.e = "a.d.e"

## We're at a new section header.
## Open dotted-key tables (`a.d`) are closed.
## The old section table (`a`) is closed. `a` may have subtables defined later.
[[b.a]] #closes a.d, opens b.a[0]
    ## The section does open `b.a[0]`. But `a.d` was already closed.
    ## (TOML doesn't guarantee 0-indexing, but I get what you mean.)
a = "b.a[0].a"
[b] # closes b.a[0], opens b
    ## It is very strange to open a table after opening the first element of an
    ## array of tables within it. But it's valid.
#a.c = "b.a[0].c" #invalid, b.a[0] is closed
    ## Yes.
a.c.d = "b.a[0].c.d" #closes b, opens b.a[0].c
    ## The table `b` isn't closed until the next section header.
    ## But the key-path table `b.a[0].c` is opened

[[b.a]] #closes b.a[0].c, opens b.a[1]
    ## The table `b.a[0].c` is closed first, then `b` is closed.
    ## But Yes, the table `b.a[1]` is opened.

[a.x] #closes b.a[1], opens a.x
    ## Yes, that's right.

## At EOF, `a.x` is closed, and the root table is closed.
falcon71 commented 6 years ago

Thank you for taking your time to annotate the example. I indeed find this very enlightening.

The root table can only be accessed between BOF and the first table or arraytable declaration, so I think it can be treated like a normal table declaration (think []).

You would allow this:

a = "a"
b.c = "b.c"
d = "d" #valid, root is still open
        #my interpretation of only allowing a single open table would have forbidden this

If I understand you correctly, you would keep track of three open tables:

  1. Root or table declaration
  2. Dotted keys
  3. Inline tables

This would lead to the following being invalid, which might seem confusing:

a.a = "a.a" #opens a, root still open
a.b.c = "a.b.c" #closes a, opens a.b
#a.c = "a.c" #invalid, a already closed

If this was to be allowed, then an arbitrary number of tables would need to be kept open for dotted keys and inline tables with dotted keys (I assume the rules would be exactly the same for inline tables. The order would matter as well). In any case, while these rules might work, I find them quite far from being "obvious" like the previous rules before dotted keys were introduced. They could simply be remembered as "don't assign [table] twice". Now users will be busy rearranging keys until the parser accepts the file, because sometimes keys need to be grouped together, except for when they don't.

eksortso commented 6 years ago

Thanks, I appreciate the feedback. I think you've got the concepts down pat. Both of your examples meet the table scoping standards that I have in mind. And you indeed can treat the top section like a normal table declaration, in which the only root-level basic assignments can be made. That's come up in past discussions, specifically in #456.

The open table tracking is a little more complicated. There'd only ever be one section table open (including root), and at most one dotted-keys table inside that. But inline tables can contain smaller inline tables, and now they can include key-path assignments, too. So an arbitrary number of nested tables could be opened up! Fortunately, line lengths for inline tables tend to be short, and if they're not, they can be expanded into sections or dotted-key assignments.

In this very Douglas Adams-y example, tables are nested seven layers deep. And during parsing, three tables are opened all at once on a single line.

pan.galactic.gargle.blaster = {large.gold.brick = {name="slice", type="lemon"}, quantity = 2}

At 93 characters, that is definitely abusing inline tables something fierce. But it's still legal.

The use of key paths in TOML is fully intended to make configs more readable. All these rules we've been discussing is intended to prevent config files from being more complicated than they need to be.

To the end user, the scoping principles are still very straightforward:

So we haven't strayed too far from the realm of the obvious, or the minimal.

I like the adage "Don't assign tables twice." That's a good way to state it!

pradyunsg commented 6 years ago

We know that we're not going to break (almost) any valid v0.4 TOML document in v1.0. So, it makes sense to just extend the current rules onto the dotted syntax.

What are the "scoping" rules for a valid TOML file in v0.4?

As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it.

[snip]

You cannot define any key or table more than once. Doing so is invalid.

How do we extend those rules so that a valid v0.4 file stays a valid v1.0 file?

Simply moving these statements to be in the "Dotted Keys" section and referencing them from table definitions should be good enough.

pradyunsg commented 6 years ago

Simply moving these statements to be in the "Dotted Keys" section and referencing them from table definitions should be good enough.

537

pradyunsg commented 6 years ago

@eksortso @falcon71 I think you've arrived to something similar to https://github.com/toml-lang/toml/issues/446#issuecomment-388243460 here?

pradyunsg commented 6 years ago

Dotted keys have been merged, but we should still clarify when tables close.

So, personally, I think we can have relaxed rules here which we can then tighten up if they cause issues. As such, I don't think we're not going to introduce any restrictions on closing of tables until after 1.0 -- the intention is to keep 1.0 backwards compatible with 0.4.

StefanKarpinski commented 6 years ago

Since 0.4 doesn't included dotted keys, any rule that doesn't affect TOML files that don't have any undotted keys would not cause 1.0 to become incompatible with 0.4. On the other hand, changing the validity rules in a 1.x version would be a violation of SemVer, would it not?

pradyunsg commented 6 years ago

any rule that doesn't affect TOML files that don't have any undotted keys would not cause 1.0 to become incompatible with 0.4.

Yeps. I did think of restricting only dotted keys but I felt it would be a little unintuitive to me - having different rules governing two equivalent ways to specify keys.

On the other hand, changing the validity rules in a 1.x version would be a violation of SemVer, would it not?

Yes. It'd need a major version bump.

pradyunsg commented 6 years ago

Closing this since I feel this has been resolved. Any additional discussion on restricting keys just becomes #446.

pradyunsg commented 6 years ago

Reopened as https://github.com/toml-lang/toml/issues/446#issuecomment-395405344 is a compelling argument to have restrictions on dotted keys.

ChristianSi commented 6 years ago

I there is consent that it makes sense to restrict the order of dotted keys, I'd propose to add this sentence (or similar) to the spec:

All dotted keys that define a subtable must be placed together.

And to give an example such as:

mainkey1 = '...'
subtable1.key1 = '...'
subtable1.key2 = '...'
mainkey2 = '...'
#subtable1.key3 = '...'  # NOT ALLOWED, 'subtable1' keys must be kept together
subtable2.key1 = '...'
subtable2.subsub.key1 = '...'
subtable2.subsub.key2 = '...'
subtable2.key2 = '...'
#subtable2.subsub.key3 = '...'  # NOT ALLOWED, 'subtable2.subsub' keys must be
                                # kept together
mainkey3 = '...'
eksortso commented 6 years ago

@ChristianSi Your newest proposal defines a sensible restriction to dotted keys. But could you clarify the following? Sub- and super-tables may be defined in any order. So I'm assuming the following would be valid. Is that correct?

mainkey1 = '...'
subtable1.key1 = '...'
subtable1.key2 = '...'
mainkey2 = '...'

[subtable1.plainsubsubtable]  # This is legal, right?
#key1 = '...' #etc.

Also, I'd asked the same thing about subtables of inline table defined outside the inline value expression in #446, but didn't get any response. Would this definitely be correct?

# Valid in TOML v0.4
mainkey1 = '...'
subtable1 = {key1 = '...', key2 = '...'}
mainkey2 = '...'

[subtable1.plainsubsubtable]  # This then would be legal, too?
#key1 = '...' #etc.
ChristianSi commented 6 years ago

@eksortso The proposed restriction only applies to subtables defined in the form of dotted keys, so yes to your first question.

The answer to your second question is also yes, according to my understanding of the TOML spec.

StefanKarpinski commented 6 years ago

It might help to be explicit about what the purpose of restrictions is. Are they intended to make it easier to implement a TOML parser with a fixed/predictable amount of state?

ehuss commented 6 years ago

Are there any rules about how dotted keys interact with normal tables or inline tables? Some examples:

Example 1:

profile.release.opt-level = 3
[profile.dev]  # I assume this should be invalid?
opt-level = 1

Example 2:

[profile]
release = {opt-level = 3}
release.debug = true  # Is this OK?
eksortso commented 6 years ago

Tables can be defined in any order, but any given table can only be defined once.

Example 1 actually is perfectly valid. It defines the table profile.release in full, using dotted key paths, before defining the table profile.dev.

It's equivalent to this:

# Same as Example 1
[profile.release]
opt-level = 3

[profile.dev]
opt-level = 1

Example 2, however, is not valid, because a table is defined twice. Since profile.release is first defined using an inline table, you can't use a dotted key path to go back in and define more non-table values inside of profile.release. You could use all dotted keys, or a single inline table, or even a new header, to define the contents of profile.release. But you can only choose one of these forms.

This would be valid:

#Example 2, with dotted key paths
[profile]
release.opt-level = 3
release.debug = true

And this would be valid.

#Example 2, with an inline table
[profile]
release = {opt-level = 3, debug = true}

There are a few other valid forms, and you can use whichever form works best for your configuration.

ehuss commented 6 years ago

@eksortso Why would header-tables be allowed to extend an existing table, but inline or dotted keys not be allowed? Going from an example above:

subtable1 = {key1 = '...', key2 = '...'}
[subtable1.plainsubsubtable]  # Why is this OK to modify `subtable1`?
key1 = '...' #etc.

Compared to:

subtable1 = {key1 = '...', key2 = '...'}
subtable1.plainsubsubtable.key1 = '...'  # Why is this not OK to modify `subtable1`?

I've tried a few different parsers on the first example. Some allow it, some don't, it's somewhat inconsistent.

My preference would be that inline tables should not be allowed to be extended by any means (dotted keys or headers).

eksortso commented 6 years ago

@ehuss In example 1, the first line defines subtable1, using an inline table. The second line begins the definition of subtable1.plainsubtable, using standard table syntax. Two different tables, defined using two different syntaxes. That is perfectly fine.

And the type of syntax doesn't matter either. Example 2 is also perfectly fine. There's something important that you've missed here. Namely, the second line begins the definition of the subtable subtable1.plainsubtable.

That's basically how dotted keys work. In defining subtable1.plainsubtable.key1, we're defining its parent table subtable1.plainsubtable and putting key/value pairs into it directly. Again, two different tables, two different syntaxes.

But if example 2 had instead looked like the following, the line with subtable1.key3 would violate the rule that a table may only be defined once (in this case, with an inline table).

# Notice the difference in the last line.
subtable1 = {key1 = '...', key2 = '...'}
subtable1.key3 = '...'  # INVALID, because subtable1 is already defined.

However, this is perfectly fine:

# Now assign a table instead of a scalar value.
subtable1 = {key1 = '...', key2 = '...'}
subtable1.key3 = {}  # Valid, because subtable1.key3 is a newly-defined subtable.

TOML v0.5-compliant parsers must parse both of your examples, they must fail the first example I provided above, and they must parse my second example. A few more tests may be in order.

I share your preference in limiting inline tables, at least regarding their subtables. Previously, I'd said that key paths and standard subtable definitions should not touch inline tables. I still stand by this recommendation as a sensible stylistic choice. But in the interest of limiting restrictions, I wouldn't require it in the standard.

AndrewSav commented 5 years ago

But if example 2 had instead looked like the following, the line with subtable1.key3 would violate the rule that a table may only be defined once (in this case, with an inline table).

# Notice the difference in the last line.
subtable1 = {key1 = '...', key2 = '...'}
subtable1.key3 = '...'  # INVALID, because subtable1 is already defined.

@eksortso, I do not think that it violates the rule, that a table may only be defined once. You are not defining the same table twice in the second line of the example above, you are simply continue adding keys to it. It is explicitly allowed in the spec:

As long as a key hasn't been directly defined, you may still write to it and to names within it.

subtable1.key3 has not been directly defined, so you certainly can write to it. And since you are not defining any table second time here, this is also, according to the spec not a problem.

Logically a TOML v0.5-compliant parser must parse this example too. Otherwise it would be non-complaint.

Moreover, contrary to what was said above, according to the spec as written the following is also perfectly valid:

a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3

If it's not the intention of the spec, it needs to be reworded to express that, because currently it does not.

eksortso commented 5 years ago

Thanks for correcting me, @AndrewSav. I've not tried every parser, but the ones I have tried agree with your interpretation of the spec.

I guess at one point I'd had in mind the notion that tables are defined in one, and only one, distinct location within a config file. With just section headers, this was obvious. But now that dotted keys can be used to inject key/value pairs into any subtable outside of its section, that distinction guarantee is now gone.

I think that @ChristianSi was attempting to preserve it with his proposed restriction. And from a stylistic perspective, I agree that dotted keys into the same table should be kept together. I would make a config file with that principle in mind. But I'm waffling on the restriction.

So in the interest of retaining the demonstrated flexibility, I'm afraid that the proposed restriction ought to be dropped. I've made the argument elsewhere that all the latitude that the current (and future) syntax allows could be abused, but in practice, some good configuration templates would prevent abuse from spreading.

At this point, no changes to the spec (i.e. README.md, for now anyway) need to be made.

ChristianSi commented 5 years ago

@eksortso @AndrewSav I believe you are both wrong and using dotted keys to "inject" key/value pairs into tables defined elsewhere is prohibited. But I admit that the spec is not quite clear on this point.

If we have

subtable1 = {key1 = '...', key2 = '...'}

then I would interpret

subtable1.key3 = '...'  # INVALID, because subtable1 is already defined.

as an attempt to define subtable1 again which is in violation of the spec as it currently stands. So yes, it should be INVALID!

Likewise, if [outertable.subtable1] was defined as a regular table instead of an inline table, then having the dotted key subtable1.key3 in the [outertable] section would be an attempt to define outertable.subtable1 a second time, which is not allowed. Conceptually, the inline table syntax is one way of defining a table, while using one or several dotted subtable1.keyX entries within one other regular table is a second way, and writing it as a regular table with its own [outertable.subtable1] header is a third way. Each of these ways is perfect fine on its own, but trying to combine two or more of them for the same table is not allowed because of the "you cannot define any table twice" rule.

If these restrictions did not exist, then the current restriction "you cannot define any table more than once. Doing so is invalid." would be in shambles, since nobody upon seeing either a regular table header or an inline table could have the slightest idea whether what they see in that place is a complete definition of that table or whether some of its key/value pairs are injected from anywhere else. I'm 99+ percent sure that that's against the spirit of TOML, but I admit that the current wording in the spec is not quite clear.

@mojombo @pradyunsg What's your stance?

(Note that all of this only applies to simple key/value pairs. If [outertable.subtable1] is a regular table, then defining [outertable.subtable1.subsubtable] elsewhere is completely normal, and above I argued that the same probably still applies if outertable.subtable1 = { ... } is defined using inline table syntax. Conceptually, nested tables are not really "in" the supertable in the same way as simple key/value pairs live within a specific table, they just take a parent table name and extend it.)

bitwalker commented 5 years ago

I think parsers must be allowed to extend existing tables with dotted keys. It has a beautiful symmetry which I think is important, and helps reason about TOML docs, both from a parsing point of view, and just reading one. To put to words what I mean by "symmetry": the way I think of TOML as a format is basically as a more readable way of writing a flat namespace of keys/value. In other words, given the following document:

[myapp]
debug = true

[myapp.logger]
level = info
format = "[$level] $message"

[myapp.listeners]
http = { type = "http", port = 8080, host = "localhost" }

myapp.listeners.https = { type = "https", port = 8443, host = "localhost" }

It is semantically equivalent to the following document, and vice versa:

myapp.debug = true
myapp.logger.level = info
myapp.logger.format = "[$level] $message"
myapp.listeners.http.type = "http"
myapp.listeners.http.port = 8080
myapp.listeners.http.host = "localhost"
myapp.listeners.https.type = "https"
myapp.listeners.https.port = 8443
myapp.listeners.https.host = "localhost"

If you disallow extending already defined tables via dotted keys, this symmetry is lost. Note that in both cases it is not allowed to redefine keys which are already defined, only to extend the global "table"/namespace (as a way of visualizing it, consider the global namespace to be _, and all keys defined in a document as being children of that key, like _.myapp.debug = true).

Doing this also provides a way for one to have "imports" which can be treated as the same logical document, allowing one to have a base config file, and extend it per-environment or whatever. Obviously we don't have imports in TOML right now, but in theory one could implement that in application code without the need for explicit TOML support, and without violating the spec, simply by restricting imports to only extending the base document with new keys.

Anyway, that's my two cents, and the parser I wrote for Elixir is designed based on that interpretation - I suspect I'm not the only one to have done so.

eksortso commented 5 years ago

@eksortso @AndrewSav I believe you are both wrong and using dotted keys to "inject" key/value pairs into tables defined elsewhere is prohibited. But I admit that the spec is not quite clear on this point.

Whether I'm wrong depends on which date my comment was written, to be fair. My current stance from 3 days ago is wrong by your measuring, @ChristianSi, but I may be persuaded to accept yours with more debate.

That said, I need to highlight a developing problem. Some parsers use the loose interpretation. They're unlikely to change now, since they claim that TOML documents already written might break on their parser if they change their behavior to the correct standard.

Since TOML v1.0 is intended to be backward compatible with the last v0. release, laying down the law now will require another v0. release, to which conforming parsers must adhere. Will strong table rules be defined in direct terms in v0.6? Will a run-up to v1.0 include a statement explicitly permitting looser rules? Or will strong rules be dropped into v1.0 with some advanced warning to non-conforming projects?

ChristianSi commented 5 years ago

@bitwalker I don't understand your example. Since your myapp.listeners.https inline table sits under the [myapp.listeners] table header, the full paths of the keys it actually defines are myapp.listeners.myapp.listeners.https.type etc.

To actually get the flattened structure you suggest, the last part of your example would simply have to read:

[myapp.listeners]
http = { type = "http", port = 8080, host = "localhost" }
https = { type = "https", port = 8443, host = "localhost" }

Nicely symmetric, but no need to inject anything anywhere.

ChristianSi commented 5 years ago

@eksortso In my understanding, "the loose interpretation" is a misinterpretation of the spec, in other words: a bug. The spec is admittedly not completely innocent, since the current wording allows different interpretations. I would therefore suggest a bug-fix release (v.0.5.1) that clarifies the meaning of the spec. No need to bump the version number to 0.6, since no new features are introduced (nor is old, non-buggy behavior disallowed).

Non-compliant parsers would then have to be updated to conform to the spec. Since dotted keys are pretty new (they did not exist before 0.5) and since I don't expect many (if any) documents to use "key injection" in the first place (what would be the point, except to confuse your readers?), I'd expect that to be a relatively painless process.

But it would certainly be a good idea to publish a spec update soon-ishly rather than to wait for a few years before the next version bump takes place. (Considering the pace of new TOML versions in the past, I am a bit worried in that regard.)

bitwalker commented 5 years ago

@ChristianSi Sorry, I wrote that example off the cuff, all I intended with the last line was to show different types of table usage in the example, but it ended up making it confusing :(

TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table.

This is the declared goal of TOML. The three key principles there, "minimal", "obvious semantics", and "maps unambiguously to a hash table" are what we should judge all questions by. In my opinion, we need the following:

I think that core principle should be (and is already to some degree): TOML is a syntax for representing a namespace of keys and values; keys are unique and there are no features which cannot be mapped to a flattened representation, where flattened representation is in dotted-key form, e.g. foo.bar = "baz". The syntax of TOML is composable, such that any combination of forms is allowed (within the syntax rules for those forms) and can always be unambiguously mapped back to the flattened representation.

Given that principle, we can define the following simple set of rules:

Furthermore, this gives us a framework from which to reason about table arrays, in other words, rather than it being an exception to the rule of redefinition, it is simpler to think of each element declaration as defining a new "hidden" key, based on the order of appearance, which a conforming parser infers from the syntax, e.g.

[[products]]
name = "Hammer"
sku = 738594937

[[products]]

[[products]]
name = "Nail"
sku = 284758393
color = "gray"

could be flattened like so:

products._0.name = "Hammer"
products._0.sku = 738594937
products._1 = {}
products._2.name = "Nail"
products._2.sku = 284758393
products._2.color = "gray"

The key guiding principle here is symmetry/composability. We have various forms of syntax for tables and their keys, so it should be possible to combine them in different ways but retain one semantic model. Prohibiting redefinition of keys is distinct from the question of whether you can "reopen" some part of the keyspace to add new keys.

@eksortso In my understanding, "the loose interpretation" is a misinterpretation of the spec, in other words: a bug. The spec is admittedly not completely innocent, since the current wording allows different interpretations.

What is the guiding principle behind your interpretation? I ask because adding more rules without some unifying principle is not user-friendly; it makes documents harder to write, harder to read, and parsing more complex and thus more error-prone. I agree it is good to add more rules when they provide clarity in the context of some guiding principle/intuition; but it is not necessarily desirable when the clarity comes at the expense of violating ones intuition. If readers of the spec are told to internalize a few simple principles ("TOML is a syntax for representing a namespace of keys", and "it is not allowed to redefine keys"), it gives them an easy way to test if some interpretation of a rule is correct ("does interpretation A imply I can redefine something?") and therefore easy to understand.

Conformity to a single way of doing things is desirable in many cases, it is one reason why code formatters in languages like Go are so nice - when there is a lot of complexity, knowing that some things will always looks the same orients you in a new context. That said, TOML is a simple format, there isn't enough complexity to justify forcing that kind of conformity, particularly when it has the potential to roadblock further improvements down the line (such as imports), or put you in a corner with regards to ambiguity. I think it is also a bad sign when there isn't any justification which explains why some rule exists which ties back to the core principles, only that some usage pattern is prohibited arbitrarily (i.e. based on preferences).

In any case, I think it is important for the maintainers of TOML to decide what mental model should be driving these design decisions (or at least make it prominent, if they have already made it known elsewhere), and then consider these questions in that framework. From what I've seen, things are too abstract (i.e. "maps unambiguously to a hash table" is not specific enough, what is the "core", or simplest possible, representation; how do features relate to that representation), and that drives both ambiguity in the spec, and different interpretations of how features should be implemented/represented, because everyone has a different mental model.

eksortso commented 5 years ago

I've come back around to the strict interpretation of table definition that @ChristianSi claims is the true one. And as @bitwalker calls for, the mental framework needs to be made clear in the documentation.

The turning point came with some example code posted before, which we must address, but for different reasons. Here is the code:

foo.bar = {}
foo.bar.baz = "true"

The first line foo.bar = {} means that foo.bar is an empty table. The second line makes foo.bar non-empty. The loose interpretation says that this is fine; it just means foo.bar = {baz = "true"}. The strict interpretation considers this code invalid, because foo.bar is defined in whole in the first line and foo.bar.baz in the second line is not a new subtable but a key/value injection.

The strict interpretation is the correct one.

I tried, best as I could, to articulate the mental rules of this interpretation prior to October 25th in my previous comments. Following are the rules as best as I can express them, and we'll need examples to illustrate them all.

Did I miss anything? Could any of these be written better?

bitwalker commented 5 years ago

@eksortso I think you missed something:

In your reasoning, you stated foo.bar = {} cannot be extended with foo.bar.baz = true, but could be extended with foo.bar.baz = {} (rule 4 in your list, but also rule 1, and is implied by the fact that one can define subtables with [foo.bar.baz]. This is a contradiction with your third paragraph (i.e. the issue you had with my original example), as under the rules you provided (which is inconsistent with regards to keys, but we'll get to that), you are still allowed the following:

foo.bar = {}
foo.bar.baz = true

As stated by the combination of rules 2, 3 and 5. You also have other conflicting rules (3 with 1, 5 with 1). Most importantly though, the rules listed here do not appear to be based around any core mental model from which to reason about them - there is no symmetry or composability to the rules, and it seems primarily motivated by restricting how a table is defined. The mental framework I keep asking for isn't just about defining a list of rules, they have to be based on something, and that something is the framework that I care about.

I think it bears stressing, tables are not important, they are not relevant to the ultimate representation of a TOML document (a hash table) except where they indicate the presence of a nested table, which only matters because they are intended to hold more keys, after all, that is what a hash table does. TOML is a convenience format for representing a nested key/value structure, so it ultimately doesn't matter what order tables are defined in, the very presence of a key indicates that the path to that key must consist of tables and nothing else, otherwise there is redefinition. Table declarations should be considered a way to reduce the verbosity of defining nested keys, since they remove the need to prepend the full path to the key for all subsequent keys, but that's it. Anything else and we already have contradictions with the features that exist in the syntax, which is supposed to be one of the things TOML is free of.