Allow keys in key-value pairs to be paths

pradyunsg commented 6 years ago

The only remaining idea from #292 that has not been decided upon and does not have a dedicated issue.

I mean, I don't know how much I like it myself but, hey, this needs discussion so, here's a dedicated issue for it.

[document]
title = "Hello!"
meta.charset = "utf-8"

StefanKarpinski commented 6 years ago

While I don't really care what the specific mental model, I have to second @bitwalker's desire for some consistent mental model. Otherwise the set of rules that gets cobbled together will end up being ad hoc and arbitrary. If it's based on some mental model, whatever that is, then it will be consistent. It can be a strict mental model or a loose one—as long as there is one.

Regarding the

foo.bar = {}
foo.bar.baz = "true"

issue, it seems to me that all values given as the right hand side (RHS) of an = in TOML are generally immutable and not extensible: integers, strings, floats. If you want to build up a structure incrementally, then you generally have to use the structure of sections and key-values pairs to do so. Mixing a presumably immutable RHS value specification with subsequent incremental addition to either tables or arrays seems pretty dicey to me.

bitwalker commented 6 years ago

it seems to me that all values given as the right hand side (RHS) of an = in TOML are generally immutable and not extensible: integers, strings, floats. If you want to build up a structure incrementally, then you generally have to use the structure of sections and key-values pairs to do so.

I agree, with the exception of tables, as the spec already allows for the following:

a.b.c = "c"
a.c.d = "b"

[foo]

[bar]
thing = true

[bar.baz]
something_else = false

If, as the spec implies, a.b.c = "d" is equivalent to:

[a.b]
c = "d"

# or

[a]
b = { c = "d" }

And if, as the spec implies, [foo] is equivalent to foo = {}, then it follows that these properties must be true:

1.) If a dotted key's path includes components which are not yet defined, they are implicitly defined to be tables 2.) If a dotted key's path includes components which are defined, and are tables, then that table is extended with the new key, not redefined 3.) If a dotted key's path includes components which are defined, and are not tables, then it is obviously an attempt at redefinition, and therefore an error 4.) Declaring a table with [a.b] defines a.b to be a table if it doesn't exist, but if it does, and the declaration has child keys (such as c = "d"), then it must be equivalent to the dotted-key representation, i.e. a.b.c = "d"; in other words, implicitly creating the table if it doesn't exist, otherwise extending that table with new keys.

The central problem here is with the fact that the spec is contradictory with regards to dotted keys and their equivalence to the table syntax. On the one hand it has examples which demonstrate 1-3 to be true. On the other, in the table section, it says that the following is invalid:

[a]
b = "c"
[a]
c = "b"

This disallows 4. However, it says the following conflicting example is valid:

physical.color = "orange"
physical.shape = "round"

This is a contradiction, and my take is that it is not possible for one of these examples to be invalid without throwing out any sensible mental model with which to reason about the syntax of tables and dotted keys.

I would agree with @ChristianSi and @eksortso if the dotted key syntax didn't exist as it is defined, but since it does, it just simply doesn't make sense to have this awkward asymmetry to the syntax, where depending on which syntax you use, something is invalid using one syntax, but valid using the other. But as you said, we need a mental model to refer back to in order to decide what makes sense and what doesn't - without that, I'm not sure we can arrive at any kind of consensus.

eksortso commented 6 years ago

Let me post this now, because I couldn't finish reading your first response without choking.

@eksortso I think you missed something:

In your reasoning, you stated foo.bar = {} cannot be extended with foo.bar.baz = true, but could be extended with foo.bar.baz = {} (rule 4 in your list, but also rule 1, and is implied by the fact that one can define subtables with [foo.bar.baz].

That's precisely what I'm saying, yes.

This is a contradiction with your third paragraph (i.e. the issue you had with my original example), as under the rules you provided (which is inconsistent with regards to keys, but we'll get to that), you are still allowed the following:
foo.bar = {}
foo.bar.baz = true

I am not allowing this. I explicitly stated that this code (using "true" instead of true) is invalid. Reread what I wrote about strict interpretation. foo.bar.baz = true is attempting to assign a key/value pair in foo.bar, which is a redefinition of that table, because the first line foo.bar = {} says that foo.bar contains no key/value pairs at its level, period. There could be KVPs infoo.bar's subtables, but not in foo.bar itself.

As stated by the combination of rules 2, 3 and 5. You also have other conflicting rules (3 with 1, 5 with 1).

Provide examples of these contradictions, if they exist. We'll get to the bottom of this.

Despite my tone, I do think we share common ground. But I'm dissuaded from reading further into the conversation. I'll follow up in a few hours.

bitwalker commented 6 years ago

@eksortso Sorry if I came across as attacking you, that's not at all my intent, I absolutely respect your opinion!

I am not allowing this. I explicitly stated that this code (using "true" instead of true) is invalid.

What I meant is that your rules, in the list, do allow it, or at least do not disallow it. You stated the following about strict interpretation:

The strict interpretation considers this code invalid, because foo.bar is defined in whole in the first line and foo.bar.baz in the second line is not a new subtable but a key/value injection.

This implies that if the second line is foo.bar.baz = {}, then it would be allowed, as it is an exception to the strict interpretation as you've stated it. That's also supported by this rule:

A subtable, or an array of tables, may be defined within the parent table's definition either explicitly (with inline tables) or implicitly (with dotted keys). This allows a subtable to be defined in the middle of its parent.

Assuming that was intentional, it is very surprising that one thing is allowed but the other is not using the same syntax.

I'm also not sure why inline tables are implied to be different than regular tables, there is nothing in the spec that indicates that there is a difference in what they represent, or even that setting keys in a table which was previously defined via dotted keys is disallowed (see the physical.* example I referenced in my last comment).

... the first line foo.bar = {} says that foo.bar contains no key/value pairs at its level, period

Is there something in the spec which supports this? I don't think it says that; but I'm not sure what it says exactly, because the spec is ambiguous about it as far as I can tell. We only have various examples of behavior around tables and dotted keys from which to infer what it may mean.

Provide examples of these contradictions, if they exist. We'll get to the bottom of this.

I'd appreciate it if you'd assume that I'm arguing in good faith, and not that I'm making stuff up; in any case, the following rules are the ones I was referring to:

1.) Tables may be defined in any order. This includes subtables and supertables. A table is defined when its key/value pairs are explicitly stated. 3.) Key/value pairs of a table are fully specified within one single continuous range of lines, though subtables may also be defined within that range. 5.) A subtable, or an array of tables, may be defined within the parent table's definition either explicitly (with inline tables) or implicitly (with dotted keys). This allows a subtable to be defined in the middle of its parent.

1 and 3, and 1 and 5 are in conflict. Namely, 1 says tables, including subtables/supertables, may be defined in any order, so it follows that subtables can be defined before their parent tables. 1 and 5 are in conflict, or at least 5 is partially redundant, because 5 implies that subtables may only be defined within a parent tables definition, but 1 says they may be defined in any order. A subtable definition implies defining a new key within its parent table, so either 1 or 3 cannot be true, as a subtable defined before a parent table means that the key/value pairs for a table are not necessarily all fully specified within one single continuous range of lines.

If I misunderstood something, definitely let me know; but I think some of these rules are in conflict, or are ambiguous, or there are additional rules missing required to clarify. In any case, my problem with the rules has more to do with the fact that I don't know how to identify what is correct behavior vs what is not, because there is no model from which to reason about them. I agree we all have common ground here, but I don't think any of us know what it is precisely, we're feeling it out in the dark so to speak.

ChristianSi commented 6 years ago

@bitwalker

[foo.bar]
baz = { truthy = true }
baz.falsey = false

Why would you (or anyone) want to write such a thing? Why not write either

[foo.bar]
baz = { truthy = true, falsey = false }

or

[foo.bar]
baz.truthy = true
baz.falsey = false

Either of these alternatives is not only easier and less confusing to read, but also easier to write.

bitwalker commented 6 years ago

@ChristianSi Yes, of course, but that's also completely not the point of the example. It is not that someone would want to write that example that way, it is about equivalence of forms in the grammar. My other comments go into plenty of detail about why that is important.

ChristianSi commented 6 years ago

@bitwalker Okay, so it seems nobody is arguing in favor of key injection as something that's actually useful and good to have. Good to know.

Your quest for simplicity is honorable but, as I understand you, you propose to achieve it by dropping the rule "You cannot define any table more than once" altogether. That would be simple, admittedly, but the simplicity in the spec would potentially lead to documents that are highly complex, hard to read, understand and reason about. IMHO you are aiming for simplicity in the wrong place. Let's aim for simplicity of the TOML documents; if this requires more complexity in the spec, the trade-off is worth it.

Also, let's face it: The rule "You cannot define any table more than once" will NOT be dropped. So far nobody has even seriously proposed such a thing. If you want to propose it, feel free to open a feature request, but I would be very surprised if the maintainers decided to agree to such a request.

ChristianSi commented 6 years ago

The main point of controversy about the strict interpretation seems to be that it prohibits

foo.bar = {}
foo.bar.baz = true  # ILLEGAL key injection attempt

but seems to allow

foo.bar = {}
foo.bar.baz = {}  # externally defined subtable

I've argued so myself, but, after rereading the spec, I revise my position and argue that the TOML v0.5 spec actually prohibits BOTH these cases, removing the controversy.

Justification: In the section on tables, the spec says: "As long as a key hasn't been directly defined, you may still write to it and to names within it."

And gives this example:

# THIS IS INVALID
a.b = 1
a.b.c = 2

Note that the spec here says nothing about which values the keys map to. So, we can modify the example, using inline tables as values instead of integers, and from the wording of the spec it clearly follows that the resulting structure is still invalid:

# THIS IS INVALID (TOO)
a.b = {}
a.b.c = {}

Just for the sake of completeness: Order does not matters in TOML (except where arrays are concerned), so obviously the example remains invalid if we reverse the ordering of the keys:

# THIS IS INVALID (TOO)
a.b.c = {}
a.b = {}

So it logically follows from the spec as it currently stands that inline tables must be complete. Any nested subtables must be defined within the inline table itself, defining them externally is not allowed.

# THIS IS FINE
a.b = { c = {} }

So, from the spec itself it follows that the potentially confusing disparity simply does not exist. @bitwalker 's honorable quest for consistency can be achieved by the strict interpretation; there is no need to drop nearly all rules and adopt an "Anything goes" model.

StefanKarpinski commented 6 years ago

I revise my position and argue that the TOML v0.5 spec actually prohibits BOTH these cases, removing the controversy.

I think that any value (table or array) that is given as a RHS should be considered immutable. That way, if you see x = RHS you know, regardless of what else is in the document, that RHS is the value of x. If you want to have an extensible table or array of tables, then you need to use TOML structure. Disallowing "injection" of values but not of subtables seems weirdly incoherent and mixed up.

bitwalker commented 6 years ago

Okay, so it seems nobody is arguing in favor of key injection as something that's actually useful and good to have. Good to know.

I didn't say that, I suspect it is useful, and I gave an example in one of my earlier comments stating one possible use case, I am sure there are others as well. The discussion in this thread certainly seems to indicate that there is some desire for the capability - or at the very least that the syntax not be self-contradictory.

Your quest for simplicity is honorable but, as I understand you, you propose to achieve it by dropping the rule "You cannot define any table more than once" altogether.

My argument is that the syntax, as it exists today, is contradictory, but implies that dotted keys can reopen tables. That implication further implies that in some situations you can "define a table more than once" (more specifically, the syntax allows you to reopen a table to add more keys with dotted keys, but one cannot do the same with the bracketed table syntax, which is inconsistent).

My proposal is that TOML needs to define a core model for the format, and base its rules around that; if there isn't one, and the rules are arbitrary, then I believe TOML can only become more complex, and so self-defeating in its stated goals. Once a core model is defined, then we can argue about what rules are required, or should be redacted, based on whether they fit the model.

That would be simple, admittedly, but the simplicity in the spec would potentially lead to documents that are highly complex, hard to read, understand and reason about. IMHO you are aiming for simplicity in the wrong place. Let's aim for simplicity of the TOML documents; if this requires more complexity in the spec, the trade-off is worth it.

As I stated before, I don't buy this argument. Yes, in theory someone could write a horrible document, but they have basically no reason to do so, as the syntax of TOML primarily consists of tools for writing clean documents, and is the main draw of the format anyway. You are worried about edge cases that are simply unlikely at best. The benefit of a simple model is that documents are easier to reason about, because there are few rules that one needs to know both to read and write them. Such a model is also easier to extend in the future, as it is flexible, and edge cases (if there are any in such a model) are much less likely to present conflicts with extension.

Complexity in the spec is just as harmful to users of the spec as it is authors/maintainers of parsers/serializers for the format - it makes it harder to remember the rules, it makes it harder to know if something you have written is valid or not, which means you need to have a validator on hand at all times, and the increased complexity means that parsers are more likely to have bugs, such as disallowing valid documents, or allowing invalid ones. Simplicity of the spec is something that benefits the entire ecosystem in the end.

I'm very much curious to see an example of something one is likely to do with looser rules that a stricter interpretation prevents, and which represents a real readability problem in practice. I can concoct some ugly TOML documents easily enough, but I'm not likely to ever do that in practice, because there is no benefit, if anything it is harder to even write such documents than to "do the right thing".

Also, let's face it: The rule "You cannot define any table more than once" will NOT be dropped. So far nobody has even seriously proposed such a thing. If you want to propose it, feel free to open a feature request, but I would be very surprised if the maintainers decided to agree to such a request.

Well you certainly seem certain about that, but I'm not so certain - older parsers already may choke on 0.5 documents (I've seen many myself just in Elixir), and the changes I've talked about are not any more serious than the breaking changes which have already occurred, and I would argue such a change, organized around a core model expressed in the spec as the basis for all syntax rules, would be at least as beneficial as any other change the spec has undergone. In any case, I absolutely would open an issue/PR, but there is no point until the maintainers weigh in on what that model is.

I've argued so myself, but, after rereading the spec, I revise my position and argue that the TOML > v0.5 spec actually prohibits BOTH these cases, removing the controversy.

Justification: In the section on tables, the spec says: "As long as a key hasn't been directly defined, you may still write to it and to names within it."

And gives this example:
# THIS IS INVALID
a.b = 1
a.b.c = 2

That is clearly invalid because a.b is a non-table value. However, as shown in the dotted keys section, the following is valid:

a.b.c = 1
a.d = 2

Note that the spec here says nothing about which values the keys map to. So, we can modify the example, using inline tables as values instead of integers, and from the wording of the spec it clearly follows that the resulting structure is still invalid:
 # THIS IS INVALID (TOO)
 a.b = {}
 a.b.c = {}

Well, no, the spec does not say that this example is invalid, in fact it almost certainly implies the opposite, because of the a.b.c / a.d example I just showed; a.b.c implicitly creates a table a and a table a.b, so setting a.d is the same as if one had written a = {} followed by a.d = 2. The spec is ambiguous about your specific example though, and is one of the main points of contention here.

Just for the sake of completeness: Order does not matters in TOML (except where arrays are concerned), so obviously the example remains invalid if we reverse the ordering of the keys:
 # THIS IS INVALID (TOO)
 a.b.c = {}
 a.b = {}

I do agree that the spec states this is invalid, because you are redefining a table which was already defined. In my opinion this should remain an error, because the intent is ambiguous due to the ordering; had it been reversed though, it no longer would be an error according to the spec, and shouldn't be, because the intent is clear.

So it logically follows from the spec as it currently stands that inline tables must be complete. Any nested subtables must be defined within the inline table itself, defining them externally is not allowed.

Well, since I'm refuting the assumptions this is built on, we can't say that.

So, from the spec itself it follows that the potentially confusing disparity simply does not exist.

The spec is ambiguous here, so no, it does not follow; the entire conversation we've had so far is about two possible interpretations (with variations in between) of the spec, so it is a given that there is some disparity/confusion, and this is because the spec does not clarify the interaction between implicit and explicit table declarations and the behavior of dotted keys. At a minimum we have to agree that our argument is based on our interpretations of the spec, and how we desire it to be read, but neither of us are in a position to say that the spec is clear about this. My argument thus far is based solely on how I believe the spec should be clarified in the future, aside from specific conclusions drawn from what things are spelled out in the spec.

@bitwalker 's honorable quest for consistency can be achieved by the strict interpretation; there is no need to drop nearly all rules and adopt an "Anything goes" model.

I didn't advocate for dropping all rules or adopting an "anything goes" model - I very clearly expressed the basis from which additional rules would be derived, there are obviously still some basic rules, and the syntactic rules of the grammar. My quest for consistency is ultimately about defining the thing we're trying to be consistent with, which you haven't stated yet. Just because a set of rules are consistent with each other doesn't mean they make sense, just that they aren't conflicting with each other; I would assume we all want the rules for TOML to make sense in the context of something - the goals of the project would indicate that the context is how to express hierarchical key/values in a readable and convenient way, for which only a few rules beyond the grammar are necessary.

Is the context instead that TOML is an opinionated format for expressing those key/values in a specific way? Then, probably, dotted keys outside of the bracket syntax should never have been added. Now you have more than one way to express the same thing, so it is hard to argue that it is opinionated, and if it isn't opinionated, why are we debating about how someone should be allowed to express key/values? I don't mean to say that you are necessarily making the above argument, but I would like to know the context from which you are advocating, because otherwise it is difficult to put myself in your shoes.

I think I will wait until we hear from the maintainers for now, unless you have specific things you want me to explain about my argument, or you share the context I mentioned; until then I don't think we can make much progress on this.

pradyunsg commented 6 years ago

I've been AWOL for a bit because a lot of real life has been happening for me.

I see a lot of interesting discussion has taken place here (thanks everyone!) but I genuinely don't have the bandwidth currently to get up to speed on it currently. I hope to make time to come around to this soon.

eksortso commented 6 years ago

I'm still catching up. (For the record, I've never advocated different types of hash tables for the different table syntaxes.)

@ChristianSi The strictest interpretation is very good. But to clarify, would the following still be legal? That is to say, can headers still be written subtable-first (ugly as that may be)? Or would you say that [a.b.c] means that the table a.b is assumed empty except for its subtables, and that the second line makes the document invalid?

[a.b.c]  # An empty table
[a.b]    # Its parent, with no key/value pairs (not counting a.b.c)

StefanKarpinski commented 6 years ago

It seems sensible that only tables which are given as right-hand-side literals be considered "closed".

eksortso commented 6 years ago

Getting back to the central topic, would the following be legal under the strictest interpretation? I'm inclined to think it's not, but perhaps it actually is. In the latter case, the openness of subtables introduced by dotted key/value pairs is still in play. And in either case, we may need to add language to the spec addressing the ordering of dotted key assignments.

a.ok.a = "Hello"
a.DD = "DISTRACTION"
a.ok.z = "Goodbye"

# And btw, we do need to update TOML syntax highlighting, in jneen/rouge I think.

@StefanKarpinski If that's true, then the above is perfectly valid, since no inline table values are involved.

StefanKarpinski commented 6 years ago

It seems fine to me since tables are being built up incrementally in any case. What is the purpose of a more strict interpretation? This is a real question. Is the purpose to allow an implementation to "close" a table earlier? Is closing a table early actually a significant benefit in any implementations?

ChristianSi commented 6 years ago

@eksortso:

@ChristianSi The strictest interpretation is very good. But to clarify, would the following still be legal? That is to say, can headers still be written subtable-first (ugly as that may be)? ...
[a.b.c]  # An empty table
[a.b]    # Its parent, with no key/value pairs (not counting a.b.c)

Sure, that remains legal. Order of table blocks (introduced by [...]) doesn't matter in TOML, except where arrays of tables (introduced by [[...]]) are concerned.

Getting back to the central topic, would the following be legal under the strictest interpretation? ...
a.ok.a = "Hello"
a.DD = "DISTRACTION"
a.ok.z = "Goodbye"

Sure, that remains legal. Order of key/value pairs within a table block doesn't matter in TOML v0.5. (Some months ago there was a discussion about prohibiting such an ordering in future versions of TOML, but that would clearly be an additional restriction which is not yet part of the spec. The strict interpretation, on the other hand, is only about making explicit what's already implicit in the TOML v0.5 spec, not about introducing new restrictions.)

ChristianSi commented 6 years ago

To help clarifying things, here is an attempt to explain the strict interpretation in an unambiguous manner and with examples. If this interpretation is accepted as the correct one, a suitable rewrite of this attempt could be incorporated into a future version of the spec (v0.5.1 or so).

Ways of defining tables

TOML has two ways of defining tables: table blocks and inline tables. TOML forbids defining the same table twice, therefore you can use either of these for any table, but you cannot use both for the same table. Moreover, you are not allowed to define the same table in two different table blocks or in two inline table literals.

Table blocks start with a table header line: [table.name] for stand-alone tables, or [[table.name]] for members of a table array. They continue with a (possibly empty) list of key-value pairs and end right in front of the next table header line (or, if there is none, at the end of the document). A special case is the root table block: it contains any key-value pairs between the start of the document and the first table header line; these key-value pairs belong to the (unnamed) root table.

A table block does not only define its main table (whose name is given in its table header line – if there is none, it defines the unnamed root table), but also any nested tables mentioned in dotted keys listed within the table block. To give an example:

# in root table
vals.nums.one = 'One'
vals.nums.two = 'Two'
vals.bools.t = true
vals.bools.f = false

This fragment defines four tables: the root table ('') and the nested tables 'vals', 'vals.nums', 'vals.bools'. (No values are inserted into the 'vals' table directly, but it is nevertheless defined because it appears within a dotted key.)

Tables must not be defined twice, therefore the following table header lines are now ILLEGAL:

[vals]        # ILLEGAL, defined in root table!
[vals.nums]   # ditto
[vals.bools]  # ditto

But tables defined within table blocks are only assumed to be semi-complete: nested tables and table arrays may be defined in other table blocks (obviously, since all tables are direct or indirect children of the root table). So, to return to the above example, all other syntactically correct table header lines which haven't yet been used as keys remain allowed, including

[misc]                # another child of the root table
[vals.literals]       # a new, not yet defined child of 'vals'
[vals.nums.specials]  # a new, not yet defined child of 'vals.nums'
# ... and anything else you can think of, except stuff like
[vals.nums.one]       # ILLEGAL, since that's already a key

Alternatively you can define tables as inline table literals. You could rewrite the above example as:

# in root table
vals = { nums = { one = 'One', two = 'Two' }, bools = { t = true, f = false } }

Inline tables, however, are values, and like other values (anything that appears on the right side of an equals sign) they are supposed to be immutable and complete. If you define 'vals' as an inline table, you are therefore NOT allowed to define any nested tables outside the inline table literal (neither as table block nor as another inline table literal).

# still in root table
vals.literals = { ... }              # ILLEGAL since 'vals' is an immutable inline table
vals.nums.specials = { ... }         # ditto
[vals.literals]                      # ditto, the chosen syntax doesn't matter
[vals.nums.specials]                 # ditto
[vals.nums.something.deeply.nested]  # ditto

The principle is simple: Anything you want to go inside an inline table must be written into the table literal.

# This is allowed, but the line will probably get too long to be really readable.
vals = { nums = { one = 'One', two = 'Two', specials = { ...} }, bools = { t = true, f = false }, literals = { ... } }
# Consider switching to table block or dotted syntax instead!

Anything said here likewise applies to inline table arrays (including arrays of inline table arrays and so on) which work in exactly the same way as inline tables.

eksortso commented 5 years ago

We have a good example that would help to clarify the standard regarding dotted keys and when implicitly defined tables are introduced. It's important to resolve this, because between three different Python TOML parsers in PyPI, one of them (uiri/toml) raises an error, and two others (sdispater/tomlkit and alethiophile/qtoml) raise no errors and define both c and d in a.b.

The example comes from sdispater/tomlkit#37. I'm hoping that I am interpreting this right.

a.b.c = 12

[a.b]
d = 34

My take is, this is invalid under TOML v0.5.0, because the table a.b is defined in two different locations: implicitly in the root block with the dotted-key definition, and explicitly in the [a.b] block. The key/value pairs do not conflict with each other, but to be valid, they must be declared in the same block.

I imagine that @ChristianSi would agree with this interpretation and would call for explicit language clearing up all confusion in a future TOML version (and also that the table a is defined in the root block); but that @bitwalker, and maybe @StefanKarpinski, would say that the TOML in the example is valid in v0.5.0, maybe with varying interpretations to allow for "scope merging." But I'm just speculating.

So to anyone interested, what is your take? Is this example valid TOML v0.5.0? What, if anything, belongs in the next version of TOML to clarify what we see happening here?

ChristianSi commented 5 years ago

@eksortso I believe that all arguments in favor of either interpretation have been exchanged, so now would be the time to Make A Decision. Sadly, since TOML's founder is an absentee owner 999 days out of 1000, such a decision is unlikely to be made. Unless somebody else with sufficient decision-making power jumps in – @pradyunsg maybe? – I fear this issue will remain unresolved, leaving the TOML world sadly fragmented :cry:

eksortso commented 5 years ago

This is administrative stuff at heart, but it must be addressed. Differing implementations is not good.

Would it speed things up if a decision pending tag were slapped onto every issue where the only thing necessary going forward is for someone with the rubber stamps like @mojombo or, as was suggested, @pradyunsg, to read the ticket, consider the arguments, and make a binding decision?

pradyunsg commented 5 years ago

I've been swamped by a lot of things in the past bit of time. I'll try to catch up on this over the coming weekend.

@eksortso which issues specifically?

eksortso commented 5 years ago

@pradyunsg, I was speaking generally, thinking that having a dedicated tag on issues or PRs might speed up response times on critical issues. Specifically I'm referring to this issue, because we're seeing divergent interpretations in the parsers. Though it could be applied to others like #553 which have been talked through thoroughly but aren't as immediately critical to the standard.

The idea behind this is that our top decision makers could focus on decision pending issues and respond to them first. But depending on what the TOML standard's actual governance model is, such tagging would be redundant.

pradyunsg commented 5 years ago

My OSS time situation isn't good. (pip 19.0 rollout hasn't been "smooth") :/

If someone could summarize the possible positions the specification could take wrt restrictions, as discussed above, it would be greatly appreciated. :)

bitwalker commented 5 years ago

@pradyunsg I'll summarize my position at least, and let others cover theirs:

In essence, there is ambiguity in the spec regarding reopening/extending tables to define new keys, namely via dotted keys vs bracketed keys, with inline tables in the mix as well.

My argument is that if the core data model is a hash table, then any combination of table syntax should be permitted to define tables, or extend previous definitions of tables as long as the restriction that redefining keys with non-table values is not violated. This keeps implementation straightforward and the rules simple for those writing TOML to remember. As I see it, any other option results in conflicting rules which are arbitrarily resolved, which does not seem to vibe with TOMLs stated goal of minimalism.

In my view, the following is valid:

# produces { a = { b = 1, c = { d = 2}}
a = {}
a.c = {}
a.c.d = 2 # extends a.c

[a] # only opens the table, reopens if it exists
b = 1

The discussion in this thread is long, but I think is worth the read, because we identify all the issues and possible solutions in detail.

See my comment below for some additional thoughts.

AndrewSav commented 5 years ago

@pradyunsg My point was that while many people in this thread feel that the following:

a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3

should be invalid; the spec explicitly allow this by saying:

As long as a key hasn't been directly defined, you may still write to it and to names within it.

It needs to be clarified if that's not the case.

I would also like to echo @bitwalker by saying that this thread is definitely worth reading in its entirety.

bitwalker commented 5 years ago

The example given by @AndrewSav reminded me of a point I would like to clarify. If that example or any of the others in this thread are actually supposed to be invalid, then it is not only important to clarify the specification, but clarify why it is invalid in the first place, beyond just "we choose to resolve conflicting rules in this specific way".

Cognitive load is just as important a metric as syntactic complexity in my opinion, and having a framework from which to reason about the rules reduces that load, as long as there is some unifying framework.

Put another way, if TOML maps unambiguously to an arbitrarily nested hash table, what do the rules described in the specification do to support that mapping or support the goal of minimalism. If any are contradictory, why? If we want to place restrictions on how the syntax allows you to describe a hash table, users and implementors alike expect those restrictions to come as a trade off, for a benefit that is worth more than the loss of flexibility. That trade off should be explained to help both users and implementors of TOML to properly reason about its use. If there is no trade off, then such restrictions probably should be lifted, or at least reconsidered.

I'll stop posting now to avoid cluttering this thread further, but I feel like the above condenses my thoughts best.

eksortso commented 5 years ago

@bitwalker Your example isn't getting you what your comment says. The introduction of [a] means, by your own standard, the code produces the following:

a.b = 1
a.a.c.d = 2

Or, {a = {b=1, a={c={d=2}}}}.

eksortso commented 5 years ago

I thoroughly back the position laid out by @ChristianSi in his November 10, 2018 comment. I couldn't express it any more clearly. https://github.com/toml-lang/toml/issues/499#issuecomment-437613979

ChristianSi commented 5 years ago

@eksortso Thanks, I still stand by that position and propose to add something like the text in that comment ("Ways of defining tables") to the next revision of the TOML spec. If further clarification is needed: it's an attempt to explain how dotted keys and inline tables interact with TOML's rule "You cannot define any table more than once".

I believe that such a clarification would not introduce any new restrictions but merely make explicit what's already implicit in the TOML v0.5 spec, as explained in an earlier comment.

pradyunsg commented 5 years ago

Just noting that this is still on my radar -- I've just not been able to make time for this.

pradyunsg commented 5 years ago

I finally managed to come around to reading this and spend some time thinking about this.

Geez y'all. This is a wonderful and dense conversation! Thanks a ton for providing your inputs here everyone! It's much appreciated. :)

Putting down my thoughts in a follow up post.

pradyunsg commented 5 years ago

I was in the "strict" camp before it got a name. ;)

@ChristianSi's well written "Ways of defining tables" semantics, are exactly as what I had in mind, when writing up the specification for dotted keys.

To reiterate poorly, inline tables are immutable and tables directly defined by a dotted key can not be "redefined" by using the [table] syntax or the inline table syntax.

i.e. The following examples are invalid:

foo.bar = {}
foo.bar.baz = "true"  # INVALID
foo.bar.spam = {}  # INVALID

vals.nums.one = 'One'
vals.nums.two = 'Two'
vals.nums = { three = 'Three' }  # INVALID

[vals.nums]  # INVALID
three = 'Three'

The following examples are valid:

vals.nums.one = 'One'
vals.nums.two = 'Two'

[vals.letters]
one = 'A'
two = 'B'

[profile]
release.debug = true

[profile.release.misc]
alpha = "A"

a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3

I never intended that this last example be valid (and neither did @mojombo), but it is as per the language used. Now that I re-read the spec, it is clear to me that the intent to disallow this is not as obvious, as I thought it was when I wrote this.

~We're going to have to live with this being valid in TOML 1.0; since I don't want to break compatibility. I do want to disallow this in the future though -- I think we should put advisory language to not do this in the spec.~

@bitwalker It would help to add a clarification in the inline tables section -- inline-tables are basically a fancier "Value" and all values are immutable.

While I do think having some reference/guidance on why certain choices were made is helpful, I don't think adding that would be critical-path for getting to 1.0.

Action items here would be, at least:

Clarify inline tables are immutable (and dotted keys can't "inject" into them) (at the end of the "Inline Tables" section)
Clarify that a table defined by a dotted key can not be overridden via a regular table but addition of new sub-tables is allowed. (after the example of super-tables in "Tables" section)
Add advice to not define dotted keys out-of-order. (after the example of ASCII-float keys in "Keys" section)

If anyone can think of additional things we should do here, please do holler! :)

pradyunsg commented 5 years ago

We're going to have to live with this being valid in TOML 1.0; since I don't want to break compatibility. I do want to disallow this in the future though -- I think we should put advisory language to not do this in the spec.

I'm on the fence on this TBH -- I don't want to break compatibility but I also really want to just straight up disallow this -- I don't see too many usecases where doing this out-of-order makes much sense anyway so maybe the breakage is fine?

I guess we should look into this in a follow up, better scoped, issue.

ChristianSi commented 5 years ago

@pradyunsg

I was in the "strict" camp before it got a name. ;)

Happy to hear it :+1:

If I understand you correctly, you definitively want to prohibit key injection into inline tables in TOML 1.0 (yeah!) but are unsure about whether or not to prohibit out-of-order definition of dotted keys like this?

a.b.value1 = 1
a.c.value1 = 2
a.b.value2 = 3

While I don't have any strong feelings on the second issue (as opposed to the first one!), my viewpoint is that such out-of-order definitions, though bad style, are harmless and should not be prohibited in TOML 1.x. For one thing, they are clearly allowed in 0.5 and hence covered by our compatibility promise, and moreover, the rule that "order of keys within a single table block" (introduced by [...] or [[...]]) "doesn't matter" is pretty clear-cut and easy to remember.

pradyunsg commented 5 years ago

are unsure about whether or not to prohibit out-of-order definition of dotted keys like this?

Yep and yep.

though bad style, are harmless

Yea, this is basically where I'm split tbh. Allowing them in TOML 1.0 isn't a PITA but it is a quirk that I (really) don't want to have.

We're going to have to live with this being valid in TOML 1.0; since I don't want to break compatibility. I do want to disallow this in the future though -- I think we should put advisory language to not do this in the spec.

Let's just stick with this.

pradyunsg commented 5 years ago

If anyone can think of additional things we should do here, please do holler! :)

No one did.

Opened #630, #631 and #632 as follow-ups. Going to go ahead and close this. Thanks again for the discussion here everyone! :)

toml-lang / toml

Allow keys in key-value pairs to be paths #499