toml-lang / toml

Tom's Obvious, Minimal Language
https://toml.io
MIT License
19.49k stars 851 forks source link

Proposal: Reference shortcut for nesting tables #744

Closed brunoborges closed 2 years ago

brunoborges commented 4 years ago

Consider the following exampe:

[servers]

  [servers.alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [servers.beta]
  ip = "10.0.0.2"
  dc = "eqdc10"

Idea: add a shortcut character to reference the outer table.

Example:

[servers]

  [#.alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [#.beta]
  ip = "10.0.0.2"
  dc = "eqdc10"

The character # is illustrative and may have to be reconsidered giving it's used already for comments.

The proposal also considers multiple levels:

[servers]

  [#.alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

[#.#.firewall]
inbound = "0.0.0.0/24" # this would become `servers.alpha.firewall.inbound`
outbound = "0.0.0.0/24" # similarly, `servers.alpha.firewall.outbount`

  [#.beta]
  ip = "10.0.0.2"
  dc = "eqdc10"

The multi-level indicator, e.g. #.#.#..., must match the current structure. For example, #.#.#.#.foo right below a single-level table [bar] must be considered invalid. IDEs and plugins for text editors should be able to provide tooltip to indicate the expanded name of the referenced tables, expanding the value.

The benefit of this feature is to simplify typing while keeping explicitly defined the level of the table, and also in reducing the chances of typos when nesting tables. Can also facilitate when copying/pasting snippets.

marzer commented 4 years ago

I agree that TOML would benefit from something like this, and indeed it's not the first time this sort of thing has been proposed.

I'm not thrilled about the use of # in this particular proposal, though, since that's already used for TOML comments. My suggestion would be to use asterisks instead, e.g.:

[servers]

  [*.alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [*.beta]
  ip = "10.0.0.2"
  dc = "eqdc10"
ChristianSi commented 4 years ago

Strictly speaking, there is not even a need for a special marker – just using one or more dots to the left of the first explicitly written key part would do the trick. Then your example could be written as follows:

[servers]

  [.alpha]  # servers.alpha
  ip = "10.0.0.1"
  dc = "eqdc10"

[..firewall]  # servers.alpha.firewall
inbound = "0.0.0.0/24"
outbound = "0.0.0.0/24"

  [.beta]  # servers.beta
  ip = "10.0.0.2"
  dc = "eqdc10"

Personally, I like the idea and would not mind seeing it in a future version of TOML. (It won't make it into v1.0 which is feature-complete.)

eksortso commented 4 years ago

@brunoborges To be honest, I'm slightly skeptical of proposals to reduce table headers' verbosity. But this one is more attractive than others I've seen, because there's no attempt to make the headers' reference level any less than absolute.

This syntax is relative in that it does require previously defined headers to describe what super-tables are being referenced. But as long as the table hierarchy is preserved going back to the root table, this syntax could work.

@ChristianSi @marzer Another character could be called for to express depth. I feel like single dots may be hard to read, even if they are easy to type. May I suggest the character > for this purpose? That could make headers like [>subtable2] or [[>>aot-entries]] pop out a bit more.

brunoborges commented 4 years ago

I'd hope we would focus on two things here but separately:

  1. First, agree on adding this feature, with well defined rules.

  2. Second, agree on what character to use as parent identifier.

For now, if possible, it may be better to use # just as for the purpose of discussing 1. Then later we can go to 2, which could be as simple as voting on multiple proposals.

That said, here's my first restriction for this feature:

This means the following should be not allowed:

fruit.apple="red"
#.banana="yellow"
eksortso commented 4 years ago

Agreed. Let's get a proper definition of [#.this] first.

This will require that we relinquish, partly, a rule that's been around a long time: that tables, subtables, and super-tables may be defined in any order. Order will make a difference if shortcuts are used.

We'll then need explicit rules for the following.

[alpha]
this-table = "alpha"

[#.beta]
this-table = "alpha.beta"

[#.gamma]
this-table = "alpha.gamma"

[#.#.delta]
this-table = "alpha.gamma.delta"

[[#.epsilon]]
this-table = "1st element of alpha.epsilon"

[[#.#]]
this-table = "2nd elements of alpha.epsilon"
is-this-legal = "?"
eksortso commented 4 years ago
  • Can only be used in table definition [ ].

Yes, for table definitions. Are you excluding arrays of tables, then?

marzer commented 4 years ago

@eksortso If we call the existing table definitions "explicit" and these new ones "implicit", then the wording of the ordering rule need only change to reflect the difference. Something like:

begin spec snippet

In addition to explicit subtable definitions, TOML supports implicit definitions of subtables using the placeholder #:

[alpha]
this-table = "alpha"

[#.beta]
this-table = "alpha.beta"

Placeholders will match against the most-recently defined super-table with the same depth (one must exist):

[alpha]
[#.beta] # ok, starts table 'alpha.beta'
[#.#.gamma] # ok, starts table 'alpha.beta.gamma'
[#.#.#.#.yotta] # ERROR! too deep

This means that unlike explicit table definitions, which may appear in any order, implicit table definitions are order-dependent.

end spec snippet

@brunoborges Something that's not clear from your initial proposal: can placeholders appear to the right of an explicit key? e.g.:

[alpha]
[alpha.beta]
[alpha.#.gamma] # should this be legal??
brunoborges commented 4 years ago

@marzer that's a great point.

I think such thing as alpha.#.gamma may be worth for long outer tables with several small child tables, so you know where you are if the upper level has been defined way above.

But then again, IMO the order must still be respected. Only explicit keys (i.e. without #) may appear out of order.

jesseli2002 commented 4 years ago

Would just like to comment that the verbose syntax of nested keys is one of the common complaints about TOML - see https://hitchdev.com/strictyaml/why-not/toml/ and the criticism section of Wikipedia's TOML page.

pradyunsg commented 4 years ago

Related: https://github.com/toml-lang/toml/issues/413

pradyunsg commented 4 years ago

verbose syntax of nested keys is one of the common complaints about TOML

TOML is not for arbitrarily-nested data representation, but rather for configuration files for tooling. It serves as a well-defined replacement for INI files. In other words, arbitrarily-nested data structures are gonna "look better" in JSON/YAML since they are data representation formats and TOML is not that.


Of all the proposals I've seen till date, this is probably the most compelling one, for a shorthand in table headers. That said, I'm not super keen on deciding on this just yet, so let's revisit this after 1.0 is out. :)

ghost commented 4 years ago

Hey, isn't a 1.0 the perfect opportunity to introduce syntax change like this?

When it gets out, everyone is going to update their parsers since it's such a big release, not sure people will be so keen for point releases. Just my two cents.

eksortso commented 4 years ago

@alvincodes The short answer is No, because parsers have already updated, or they ought to have been.

Adding new features (like this one) is usually done well in advance of major revision updates, typically during alpha or beta stages. It's a little more flexible before a 1.0 drops, but we have two pre-releases out already. Feature lock is essentially assumed, and new features would require extremely persuasive arguments to push through at this stage.

When a "feature freeze" occurs in a project depends on how the project is run, but the idea is that enough time is given for users to try out the new features in alpha, beta, etc. stages. In the case of the TOML standard, the parser writers can try on the new syntaxes, shake out any issues that are found, and provide feedback. As it stands, there are a bunch of changes that went in since v0.5.0 that parsers will need to implement. Would we want to throw this onto that pile, even though we've not spoken about it for three months?

Edit: adjusted text referring to changed since v0.5.0; better late than never.

brunoborges commented 4 years ago

I agree this feature is not extremely magical for being pushed through 1.0.

But I'd love to see it in the next version of the spec.

Torxed commented 4 years ago

I was just trying out TOML for the first time. And I might be alone on this, but I thought nesting tables was done by doing:

[servers]

  [alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [beta]
  ip = "10.0.0.2"
  dc = "eqdc10"

Where the indentation mattered. But there's no where any mention that this isn't the case. I found some mention of white-space should be ignored around a table definition, but I couldn't find that under https://toml.io/en/v1.0.0-rc.3#table. So naturally they should have some meaning if the behavior around whitespace is undefined.

But even without reading the specs (or if I missed it some where), naturally I would assume (coming from a Python background I guess) that the indentation mattered, and both alpha and beta would be sub-dictionaries (or tables if you will) of servers:

{
    "servers" : {
        "alpha" : {
            "ip" : "10.0.0.1",
            "dc" : "eqdc10"
        },
        "beta" : {
            "ip" : "10.0.0.2",
            "dc" : "eqdc20"
        }
}
eksortso commented 4 years ago

Well, @Torxed, you're not the first person with those same thoughts. TOML is influenced by old INI configuration formats, but I don't think anyone ever considered that there are people who have never come across an INI file before.

Nesting can be done in any of three different ways: a table header with a dotted name (illustrated below), dotted keys inside a table section, and inline tables. The table headers are always absolute table references, and inline table names and dotted keys are always relative to the current table section. Indentation of table headers and keys are not significant (they're treated as whitespace) and are just available as visual reminders of nesting levels.

At its simplest, your example would look like this (and the parent table header [servers] can be left out):

[servers.alpha]
ip = "10.0.0.1"
dc = "eqdc10"

[servers.beta]
ip = "10.0.0.2"
dc = "eqdc10"

So now I'm considering writing a PR to either remove the indentation from the markdown document's examples, or to explain the indentations away. As it stands now, the indentation just confuses those with no experience in any dialect of INI.

ChristianSi commented 4 years ago

@eksortso Looks like you used the wrong syntax highlighter in your example. It shouldn't be all that red.

eksortso commented 4 years ago

@ChristianSi It was something that I'd copied and pasted, so I just typed in the code again, and it highlighted correctly.

pradyunsg commented 4 years ago

So now I'm considering writing a PR to either remove the indentation from the markdown document's examples, or to explain the indentations away. As it stands now, the indentation just confuses those with no experience in any dialect of INI.

I'd say both. :)

PRs welcome and this doesn't really need another round through the Release Candidate cycle.

brunoborges commented 4 years ago

Hi all,

I'd like to ask for a quick vote on which symbol/character to use. Please use the corresponding emoji and react to this comment with your preference.

Character Emoji to Vote
# 👍
& 🎉
$ 🚀
% ❤️
. 👀
* 😄
Torxed commented 4 years ago

Forgot the . (dot) option [.subLevel]?

marzer commented 4 years ago

Hi all,

I'd like to ask for a quick vote on which symbol/character to use. Please use the corresponding emoji and react to this comment with your preference.

Character Emoji to Vote # 👍 & 🎉 $ 🚀 % ❤️

None of the above? Asterisks imo.

brunoborges commented 4 years ago

@Torxed @marzer added both options to the poll.

eksortso commented 4 years ago

I'd suggested a single greater-than / right angle bracket > for each level before, but it's not on the poll.

I'd accept the two-character combo *. for each level. But I would not want just single asterisks *. The poll doesn't say which of those options you'd be backing if you picked 😆.

So I voted 👎️. I'll change it if I can get some clarity on these things. Please advise?

Torxed commented 4 years ago

I like the fact that . (dot) follows the common syntax when creating nested levels without the brackets.

something.here = true
something.else = false

And omitting an entry prior to the dot, is simply a reference to a continuation. I'm new here, but that feels logical to me without having to add extra characters.

marzer commented 4 years ago

People seem very fond of the 'just dots' syntax. I must caution against this; adding this syntax to the language would be adding something visually indistinct and error-prone. Example:

[a.b.c1]

    [...d1]

    [...d2]

  [..c2]

    [...d3]

While still being unambiguous, absolute paths, I do not believe that they're visually distinct enough to be able to easily determine the depth at-a-glance. Even just writing this example, I added c2 last and had to go back and re-count the dots to be doubly sure! (Obviously real TOML would use more meaningful table names, but the depth/complexity presented here is representative.)

The alternative suggestion of >, and any other proposal that relies on simple repetition of one character, suffers from this same problem in addition to introducing additional syntax burden.

@pradyunsg mentioned that the proposal as-written is more compelling than the many alternatives for solving this problem, and I'm inclined to agree. Thus, the vote should only be about deciding which character plays the role of the the proposal's 'placeholder' character, and anything else should be relegated to separate proposals.

@eksortso so yeah, my suggestion of * was as-in *., since that's what I think the poll was intended to ask. Perhaps @brunoborges could clarify the intent?

eksortso commented 4 years ago

Thanks @marzer. I've changed my vote to support *. for the level placeholder.

Thanks, also, for stating the case against single-char placeholders so convincingly. I hope that it compels others to reconsider.

Torxed commented 4 years ago

People seem very fond of the 'just dots' syntax. I must caution against this; adding this syntax to the language would be adding something visually indistinct and error-prone. Example:

[a.b.c1]

    [...d1]

    [...d2]

  [..c2]

    [...d3]

While still being unambiguous, absolute paths, I do not believe that they're visually distinct enough to be able to easily determine the depth at-a-glance. Even just writing this example, I added c2 last and had to go back and re-count the dots to be doubly sure! (Obviously real TOML would use more meaningful table names, but the depth/complexity presented here is representative.)

The alternative suggestion of >, and any other proposal that relies on simple repetition of one character, suffers from this same problem in addition to introducing additional syntax burden.

@pradyunsg mentioned that the proposal as-written is more compelling than the many alternatives for solving this problem, and I'm inclined to agree. Thus, the vote should only be about deciding which character plays the role of the the proposal's 'placeholder' character, and anything else should be relegated to separate proposals.

@eksortso so yeah, my suggestion of * was as-in *., since that's what I think the poll was intended to ask. Perhaps @brunoborges could clarify the intent?

Fair enough, I think discussing this here though makes sense seeing as it's tied to which option I should and shouldn't vote for. I was under the assumption that only one sublevel could be defined, much like domain names can have a .domain.com definition, but can't have ..domain.com. Aka, one sublevel maximum to avoid ambiguous definitions : )

marzer commented 4 years ago

Fair enough, I think discussing this here though makes sense seeing as it's tied to which option I should and shouldn't vote for. I was under the assumption that only one sublevel could be defined, much like domain names can have a .domain.com definition, but can't have ..domain.com. Aka, one sublevel maximum to avoid ambiguous definitions : )

Ah, well if it's just a new nested table relative to the header immediately above (which is what I think you mean?) it comes with its own issues:

[a.b]

    [.c]   # a.b.c, ok!

    [.d]   # wait, am I a.b.d, or a.b.c.d?

Whether .d becomes a.b.d or a.b.c.d are both reasonable to expect, and that ambiguity makes it problematic since the 'O' in TOML stands for 'Obvious'.

Additionally, if .d resolved to a.b.c.d, then it means the relative syntax would be good for only a single use among siblings!

Note that there is already a proposal for that syntax, with much discussion: https://github.com/toml-lang/toml/issues/593

eksortso commented 4 years ago

Thanks, @Torxed, but yeah, the original proposal allowed for multiple levels. If it hadn't, no doubt there would be a demand to allow it. What's more (and this hasn't been discussed at all) but it could extend from table headers to dotted keys within tables.

Not proposing this now, but imagine it being taken up in due time.

fruit.name = "apple"
*.color = "red"
*.tartness = false
Kixunil commented 4 years ago

@marzer you bring up a very good point that it's ambiguous. I was thinking maybe this is more obvious?

[language.python]
          [^.properties]
          [<.stats]
          [<.examples]
                 [^.simple]
                 [<.complex]
        [<.<.rust]
              [^.properties]
# ...

Can you figure it out without me explaining?

...

My thinking: ^.x means relative to the last table, <.x means relative to the parent of the last table.

marzer commented 4 years ago

@Kixunil Gives me serious YAML vibes; TOML keys being 'kinda-verbose-on-purpose' is one of its strengths, so adding a bunch of extra syntax to do relative-referential stuff we greatly harm its simplicity.

Having said that, I don't see any issue with it being fleshed out further as an idea, though I recommend you do so on a different issue/proposal, rather than muddy the discussion of this one.

Kixunil commented 4 years ago

@marzer Uh, I thought this was on-topic about shortcut for nested tables, what I'm missing? Since the proposed shortcuts are ambiguous I presented an idea how to resolve the ambiguity.

marzer commented 4 years ago

Your suggestion goes beyond a simple alternative to the syntax in @brunoborges proposal and instead introduces nontrivial relative logic, indicating it is better-suited to being an alternate proposal in-and-of-itself. I think fleshing it out here would not be appropriate.

Note that I'm not discouraging you from pursuing the idea, I'm merely suggesting that that there's enough complexity in it that I think it is off-topic in this proposal's discussion thread. Even my clarification above is borderline; I was trying to prevent further re-purposing of this proposal, but I worry I've inadvertently encouraged it.

Kixunil commented 4 years ago

I see, thanks for clarifying! I wonder though if the proposed syntax is ambiguous and there could be other proposal, would it be more valuable to try and think about them both at least to some degree? The intention is to not end up with random characters (e.g. * for relative to the parent and $ relative to the latest) - how are people supposed to remember what they mean? That was my thinking behind ^ and < which look like arrows and as such provide visual cues.

I'm now withdrawing from this discussion unless there's something unclear about my point or if there's a consensus that my point makes sense and people want to discuss it further.

eksortso commented 4 years ago

It's important that we maintain the rule that, even with the use of shortcuts, a table header must always be an absolute reference to a table.

And, likewise, a table array header must always be an absolute reference to the most recent table array element. If the resolution of such a header turns up the same name, then this begins a new element of the array, same as always. For instance (and forgive me but I'm using asterisks):

[[language.python.examples]]  # first element
name = "complex"

[*.*.examples.prereq]  # subtable of the first element
version = ">=1.1"

[[*.*.examples]]  # second element
name = "simple"

It's like to add another rule: the last subtable's name must always be specified explicitly. This is to prevent total loss of context. Call me crazy but I would prefer not to see [[*.*.*]] in a dozen or a hundred places. Requiring it to say [[*.*.examples]], for instance, would help to maintain context and wouldn't be an enormous burden to write. At the very least, I'd want all-shortcut table array headers to be discouraged, except when the context is crystal clear and extreme brevity matters.

brunoborges commented 4 years ago

The dot proposal IMO goes against the main driver of TOML over YAML: no implicit magic. Semantic whitespace is bad because we can't see. Dots are visible, but the placeholder character, under this approach, would be implicit/hidden. Therefore, doesn't fit the goal of this proposal. Indeed, @marzer's comment [1] is perfect.

The > < ^ characters are certainly an interesting proposal to debate, but one that is worth of its own issue.

Following the discussion then, I still like the # character, even though it is also used as a comment, but then, in many programming languages, we see characters that can appear in two different locations, for different purposes.

For example, in Bash, # is used for comments, but can also appear as part of a string value:

# This is a comment
foo=#bar # but not in the first occurrence of # in this line.
echo $foo
# prints "#bar"

It should be easy to specify that # is used for comments, but also for placeholders when inside [ ] and [[ ]].

I personally don't like stars, as *.*.foo, because it gives me the impression that such table would be part of all parents matching *.* levels above.

$ is an interesting option IMO too, as it denotes variables in certain languages. In Ruby, $ is used to define global variables.

Most importantly, I prefer # because it is a very visible, squared-ish character that any font would display it nicely.

[1] https://github.com/toml-lang/toml/issues/744#issuecomment-709206078

marzer commented 4 years ago

I won't lose sleep if you choose not to use asterisks, but please don't use #, it would only add confusion. The bash example should be discouraging, not encouraging, frankly.

Also, you conducted a poll, why ignore it? There have been zero votes for #.

Kixunil commented 4 years ago

Oh, I just realized that my previous comments were due to confusing what .x means. I thought it was supposed to be relative but you meant it as absolute. Based on this experience I vote for *.x, which should be more clear.

Torxed commented 4 years ago

Oh, I just realized that my previous comments were due to confusing what .x means. I thought it was supposed to be relative but you meant it as absolute. Based on this experience I vote for *.x, which should be more clear.

I also assumed the nested levels would be relative and not absolute. And in my world you not be able to go beyond the last table definition. Meaning:

[something]
a = true
[.."a level"]
b = true

Wouldn't make sense to me. It would look and feel messy pretty fast. So IMO that should raise an exception in the validator.

brunoborges commented 4 years ago

Also, you conducted a poll, why ignore it?

Because the most voted options were the . and the * which we, I believe, already agreed would be less than ideal for reasons already mentioned.

If not #, and not these other two, how about we move on to discuss other options?

The requirement should be simple: a character (or a couple) that are easy to spot visually when glancing over the document, and feels idiomatic to TOMLs principles.

marzer commented 4 years ago

Because the most voted options were the . and the * which we, I believe, already agreed would be less than ideal for reasons already mentioned.

You're mistaken; opposition to . yes, but nobody raised any technical opposition to * until you just before, and nobody has agreed with you on the points you raised. It seems as though you're conflating the two.

brunoborges commented 4 years ago

Fair point, @marzer !

What's your feedback on the risk of using [..bar] and this being confusing to newcomers who are accustomed to "." as "all the things", or regex, meaning... the idea that this would mean such table would be added to all tables above that match that level?

marzer commented 4 years ago

@brunoborges See https://github.com/toml-lang/toml/issues/744#issuecomment-709206078. I think the "just dots" option would be visually indistinct and confusing, and shouldn't be considered on those merits. How it relates to regex doesn't really come into it.

I also think the fear of * taking on a "match all the things" vibe because it means that in other contexts is unfounded; . means "match any" in regex but TOML uses it as a heirarchy delimiter. [] are the array index operator in programming languages, but TOML uses them for table headers. I don't think these have been an issue for people.

To be clear: I think your proposal as you originally worded it is pretty much perfect: absolute paths allowing for a placeholder character in between the dots, just that using # for that purpose would be very problematic.

The only thing that trumps a comment delimiter is a string delimiter (i.e. # in a string is just a part of a string), and this is consistent with comments pretty much everywhere (even in your bash example- the only difference is that the string delimiters there are invisible for some crazy reason). Whether * were to be used in its place is secondary; my reason for preferring them is entirely aesthetic and I could live with something else.

marzer commented 4 years ago

(I also second @eksortso's suggestion of a provision that the last table name must be explicit, e.g. *.*.table)

ChristianSi commented 4 years ago

I'm one of the many who voted for *. in the poll – @marzer's arguments convined me to move away from the just . option for which I had originally voted. Since the poll was made, it's certainly reasonable to expect that it will be honored – at least when it comes to making a choice between the various two-letter options.

I also agree with @marzer that #. would be a very bad choice since unquoted # otherwise introduces a comment and it would be confusing when sometimes, somehow, it doesn't.

brunoborges commented 4 years ago

My argument in favor of # is that, besides being the largest character and best option for visibility and readability when it comes to placeholders, if it appears after an open [, it should not and could not be considered a comment as that would be an invalid table key.

This is not allowed in TOML today, as the key cannot have line breaks.

[ # this is a comment
foo.bar # this is another comment
] # this is clearly not allowed today
name="value"

But this below can be understandable and easily parseable by any plugin/editor, and backward compatible:

[#.bar] # this is a comment
name="value" # comment

It is unlikely that a TOML file would have a comment on every single line, and it is even less common to see comments on the right side of a table definition: quite often, the comment describing the table comes above of it.

This is certainly my last attempt in defense of #, and as long as we progress with the overall idea, I'd still be happy about it!

I'd also be curious to read your thoughts on why the other options, $ % and & are less attractive than *.

marzer commented 4 years ago

That it is possible to engineer it such that # might mean a comment in some contexts and not in others is not an argument in favour of using it, but against using it. That need for special-casing makes # a poor choice. Nothing to do with parsing complexity, but user experience- if given the choice between adding something simple, and something slightly-less-so, all else being equal, we should choose simple by default.

I gave my reasoning behind preferring * earlier in the discussion, but to reiterate: it's entirely an aesthetic preference. That's it. It just looks cleanest to me. We could choose something else, but the results of the poll above so far do seem to indicate that I'm not alone in that preference.

brunoborges commented 4 years ago

I'm afraid the poll does not reflect a broader audience of TOML users, though.

Any suggestion on how to get more people to share their feedback?

marzer commented 4 years ago

Oh for gods sake. If you're gonna ask for a poll, accept the results.