Does TOML need to make nesting easier?

pradyunsg commented 3 years ago

This is basically an issue pulled out of #516 and #744. The underlying motivation/question is the same across both issues: how about we make writing nested tables structures easier?

I'd like to breakup the entire "space of discussion" down into these questions:

Does TOML need to make nesting easier?
What should that look like?
- JSON-like? #516 is exploring how that'd look.
- INI-like? #744 is exploring how that'd look.

I'm undecided/unsure about the first question, and if the answer to that is a "No", the other two issues are basically wrapped up. So... let's explore that question here. :)

pradyunsg commented 3 years ago

I'll quickly jot down my thoughts, before I head into what is likely going to be a long and frustrating debugging session:

Repetition: necessary today, because the only way to reach name inside a table is to do table.name
- benefit: it's immediately clear where the key-value goes -- the entire path is visible not-too-far-away and without needing to look up context beyond "what table is this in".
- cost: it makes authoring a TOML document bit more difficult since it's more typing and more places to update on changes.
New syntax:
- benefit: allows getting rid of the repetition.
- cost: more complexity and/or stuff to know for reading/authoring TOML documents.
- cost: we might end up introducing "too many reasonable ways to write the same thing".

ChristianSi commented 3 years ago

Making things easier is always good, so my general response to the question would be YES – at least when we reformulate it as "Should TOML ..." rather than "Does TOML need ...".

However, I think that inline tables fit into TOML's style only when used for short things, hence I would leave them as they are, instead of pursuing #516. #744 on the other hand is very promising to decrease the visual clutter and repetition in table headers.

LongTengDao commented 3 years ago

JSON patch

allow newline in {} is really good, and maybe comma could be omitted (or optional, depends on aesthetic where we use it), to make it looks like block table and more clean:

[block]
a = 1
b = 1

inline = {
    a = 1
    b = {
        x = 1
        b = 1
    }
}

but-inline-with-comma = {
    a = 1,
    b = {
        a = 1,
        b = 1,
    },
}

YAML magic

As for short kind of section name... I can just say that it looks amazing, I need to be very patient to count which level the value belongs to. Maybe there will be solution, but I'm sure it will be very different from current style in the end.

Holistic view

In the ordinary programming,

we usually do:

for ( var i = 0; i < values.length; ++i ) {
    var value = values[i];
    ...
}

stick to semantics:

for ( var index = 0; index < values.length; ++index ) {
    var value = values[index];
    ...
}

crazy for short:

for ( var i = 0; i < s.length; ++i ) {
    var v = s[i];
    ...
}

Compare the above approaches, we are not hard to find that, sticking to one form is not necessarily the best course of action, whether dotted or indent. JSON patch is a good way to differentiate global and local hierarchy (the whole file is flat, and locally raised), but shorthand magic lead us to YAML & *... One path leads to darkness, because the difficulty we facing is too big, any single way trying to eat it, will finally explode.

--Which is clear, to zoom out your file?

【ini】（simple）
[xx]
xx
xx

【json】（support complex）
xx( xx( xx, xx ), xx( xx, xx ) )

【jsonism】（too complex to handle）
xx( xx( xx, xx ), xx( xx, xx ) ), xx( xx( xx, xx ), xx( xx, xx ) ), xx( xx( xx, xx ), xx( xx, xx ) )

【yaml】（try to resolve）
xx
  xx
    xx
    xx
  xx
    xx
    xx
xx
  xx
    xx
      xx
      xx
    xx
      xx
      xx
  xx
    xx
      xx
      xx
    xx
      xx
      xx
xx
  xx
    xx
    xx
  xx
    xx
    xx

【toml】（better solution）

[xx]

xx
  xx
  xx
xx
  xx
  xx

[xx.xx]

xx
  xx
  xx
xx
  xx
  xx

[xx.xx]

xx
  xx
  xx
xx
  xx
  xx

[xx]

xx
  xx
  xx
xx
  xx
  xx

【tomlism】（not good any more, and in fact no difference from yaml）

[xx]
[xx.xx]
[xx.xx.xx]
[xx.xx.xx]
[xx.xx]
[xx.xx.xx]
[xx.xx.xx]
[xx]
[xx.xx]
[xx.xx.xx]
[xx.xx.xx.xx]
[xx.xx.xx.xx]
[xx.xx.xx]
[xx.xx.xx.xx]
[xx.xx.xx.xx]
[xx.xx]
[xx.xx.xx]
[xx.xx.xx.xx]
[xx.xx.xx.xx]
[xx.xx.xx]
[xx.xx.xx.xx]
[xx.xx.xx.xx]
[xx]
[xx.xx]
[xx.xx.xx]
[xx.xx.xx]
[xx.xx]
[xx.xx.xx]
[xx.xx.xx]

【toml magic】

[xx]
[.xx]
[..xx]
[..xx]
[.xx]
[..xx]
[..xx]
[xx]
[.xx]
[..xx]
[...xx]
[...xx]
[..xx]
[...xx]
[...xx]
[.xx]
[..xx]
[...xx]
[...xx]
[..xx]
[...xx]
[...xx]
[xx]
[.xx]
[..xx]
[..xx]
[.xx]
[..xx]
[..xx]

（I think I'm coding bash in cli...）
cd xx/
cd ./xx/
mkdir xx
cd ../
...

JSON and YAML are like an article without headings, while TOML is better with obvious headings. But if an article has only headings (like mind map), there will be no difference from no headings -- and quite contrary to wishes, leading to very verbose in local parts. Too many points makes no points, single kind of structure makes no structure. This is a waste of heading...

eksortso commented 3 years ago

Perhaps I could bring #525 back from the dead, at least for discussion.

@LongTengDao I do prefer omitting commas when the tables extend over multiple lines. It's more TOML-like. To avoid becoming pseudo-JSON, I wanted that to be the standard, and not permit commas for mult-line inline tables.

marzer commented 3 years ago

Does TOML need to make nesting easier?

Yes

What should that look like?

An INI-like proposal, like #744. I really appreciate TOML's focus on [table headers] as a nice human-oriented document structuring device; any proposal that leans more towards JSON-like structured data representations would just dilute that strength and turn TOML into JSON 2: Electric Boogaloo

eksortso commented 3 years ago

Let's look at what we already have. We've got a lot.

@marzer is right about the strength of INI-style absolute-referencing headers. That is foundational to this standard, and it is imbued with a lot of flexibility for nesting subtables. #744 just strengthens that foundation. But such a standard sacrifices expressiveness in two ways, which are brought up in issues often:

The top level is fixed to the top of the document, before all headers, and cannot be moved. Key/value pairs are attached to the table's root. Users complain that this restriction is artificially imposed. To that, we've reiterated that the top level is so extraordinary that it makes no sense to open it up anywhere else, and that a well-designed configuration will then take advantage of subtables instead of just dumping everything in that top level. (This needs to be mentioned, even though depth is zero at this level.)
Subtables cannot be expressed in the middle of their parents. That's where the other options for defining tables come into play. Dotted keys offer a semi-flexible means of inserting tables in context. And inline table values provide a rigid approach to inject short tables where they're needed. These approaches are more JSON-like than INI-like. But if designed properly, configuration files on the whole do benefit from their usage.

Design comes up a lot here. I think it's important to sell the strengths of table headers to users, and to show how arbitrary subtable depths can be achieved. But at the same time, the format deliberately makes shallower subtable nesting more appealing. That benefit is worth selling.

But what about deep nesting? What can it accomplish? Perhaps quite a bit. I could imagine an elaborate machine learning model being written out in TOML, for instance. But at such levels, we go beyond normal human comprehension. Such documents are written by computers for computers to digest, with readability given a brief acknowledgement.

But that was still a selling point. A table's depth is expressed entirely by the names in its header. Braces aren't required. You could look at a TOML-based data document and navigate your way through with simple keyword searches. Those best practices have practical value.

That's my mental model, and my inner sales pitch.

ghost commented 3 years ago

For nested structures, my preference is INI-like. Like in the proposal from @brunoborges in https://github.com/toml-lang/toml/issues/744

[servers]

[#.alpha] # servers.alpha
ip = "10.0.0.1"
dc = "eqdc10"

[#.beta] # servers.beta
ip = "10.0.0.2"
dc = "eqdc10"

brunoborges commented 3 years ago

If I were to introduce a way to add nested blocks, I'd probably use an XML-like design here, such as:

[servers]

[alpha] # servers.alpha
ip = "10.0.0.1"
dc = "eqdc10"

[beta] # servers.beta
ip = "10.0.0.2"
dc = "eqdc10"

[/servers]

But then, it becomes a challenge to parse this when you don't know whether [beta] is a nested table of [alpha]. Then it would probably require all tables to have a closing tag - a table footer.

eksortso commented 3 years ago

@brunoborges Did you just introduce servers.alpha.beta in your example? It seems like you did.

There's no mixing absolute header names with relative ones like this.

brunoborges commented 3 years ago

Exactly my point, if you read the last line from my comment above :-D

If we are to introduce nesting blocks, I'd be in favor of a complete XML-like style, with all table headers having to have a table closing, like this:

[servers]
 
[alpha] # servers.alpha
ip = "10.0.0.1"
dc = "eqdc10"
[/alpha]

   
[beta] # servers.beta
ip = "10.0.0.2"
dc = "eqdc10"
[/beta]

[/servers]

eksortso commented 3 years ago

But then you'd just be making a more complicated XML. Folks already complain about redundant table names, so I doubt this would win anyone over.

komkom commented 3 years ago

I am not sure if this is the right place but here it is anyways. I don't see the benefit of allowing nested keys in inline tables. The only positive side I see to this is that keys in general only need one definition. But without a deeper knowledge of toml a doc like the one below is difficult to understand IMO.

key = {a.b.c=1}

rezamahdi commented 3 years ago

I think https://github.com/toml-lang/toml/issues/781#issuecomment-716720352 will be so useful and may be valid in TOML, but the main issue about nesting is verbosity.

I suppose to use '...' for referring to supper table. For example this...

[server]
port=8080
[server.database]
name='dbhost'

.... Become this...

[server]
port=8080
[...database]
name='dbhost'

This will relax verbosity a little.

eksortso commented 3 years ago

There's a similar proposal on #744, which uses asterisks as placeholders.

Your example would look like this under the proposal:

[server]
port=8080
[*.database]
name='dbhost'

nugend commented 3 years ago

It is not obvious to me that these proposals provide for simpler and more legible configuration files than just allowing newlines in inline tables.

The entire problem with deeply nested tables is that they become unreadable and you lose your place and context for where you are. By adding placeholders and other sorts of references you still end up losing your place and context for where you are when the number of nested elements grows large rather than the depth grows.

A mild level of nesting has never, ever been a problem. If configs only ever needed something like a maximum of 4 to 6 levels of nesting, TOML would probably not be needed. The problem emerged due to deeply nested configurations. TOML resolves that by letting you point directly into a deep level. Adding light nesting returns the ability to have slightly more complex structures with additional context and the inline tables accomplish this for small numbers of elements. The problem comes when you have more than 3 or 4 elements in a small nested structure and it longer than is comfortable on a single line.

Fundamentally, this is always an issue of taste. There's not a consistent solution to this problem. So just let people make decisions based on taste and stop trying to square the circle. Automated tooling isn't ever going to be perfect, but being able to say something like "Drop into inline tables when there are fewer than M top level elements and nesting is no deeper than N levels" or "For Mi top level elements with no deeper than Ni levels do this. For Mj top level elements with no deeper than Nj levels do this" is better than what things end up looking like now when automated formatting spits something out.

lmna commented 3 years ago

Complexity of TOML language is at barely tolerable level. Adding even more syntactic features would ruin the objective of being obvious.

nugend commented 3 years ago

Then close the issue. The suggested syntax above in all the examples is novel and would lead extremely confusing large TOML documents.

eksortso commented 3 years ago

@lmna Our working definition of "obvious" is that you ought to be able to understand how TOML works with a single reading of the specification. This probably isn't the place to talk about that, aside from where table nesting is a factor. But I invite you to raise another issue, or start a discussion, to promote obviousness as you see it.

@nugend That suggested syntax is on #744. You can discuss it, or make a case to reject it, there. I think it could be useful, despite the additional considerations it would bring.

Please take a look at #525, and its followup PR #551, which I closed after almost a year's silence. I'm opposed to commas between KVPs except on single lines, but I thought that allowing newlines (and comments) as separators would work better at keeping inline tables in their place. What would you think of this?

nugend commented 3 years ago

I suspect they both stalled because they try to split the baby of "nesting can get too deep" vs "make nesting less painful"

The resistance to JSON-like syntax is non-obvious to me. JSON sucks for configuration because it's actually a serialization format for a very type poor language, you have to quote every key, and because deep nesting hurts and there's no escape hatch.

People like delimited, nested representations when they're not out of control because they visually indicate structure and because adjusting that visual representation is easy to do. All the the proposals that are not basically, "Fuck it, let's just do something that works kinda like JSON," don't actually serve the concerns of visual structure and easy adjustment.

Here's an example of what I would personally like to see.

[some_section]
api_params = {
   credential1 = "val1", credential2 = "val2", credential3 = "val3",
   connectionTimeout = 1, connectionRetries = 2, queryTimeout = 3, queryRetries =4,
   output_format = "HDF",
   }

BTW, I think trailing commas should be allowed too both within single line nested tables and multi-line nested tables should they be added.

If I adhere to #525/#551 and want to retain a visual indication of the related nature of the keys, that has to be this:

[some_section]
api_params ={
   credential1 = "val1"
   credential2 = "val2"
   credential3 = "val3"

   connectionTimeout =1
   connectionRetries = 2
   queryTimeout = 3
   queryRetries = 4

   output_format = "HDF"
   }

Which isn't actually better than the existing tables:

[some_section.api_params]
credential1 = "val1"
credential2 = "val2"
credential3 = "val3"

connectionTimeout =1
connectionRetries = 2
queryTimeout = 3
queryRetries = 4

output_format = "HDF"

Now what happens if we end up splitting api_params up because we start using some tool for secret management?

[some_section]
api_params = {
   connectionTimeout = 1, connectionRetries = 2, queryTimeout = 3, queryRetries =4,
   output_format = "HDF",
   }
secrets = {credential1 = "secret_reference1", credential2 = "secret_reference2", credential3 = "secret_reference3"}

In the other syntaxes:

[some_section]
api_params ={
   connectionTimeout =1
   connectionRetries = 2
   queryTimeout = 3
   queryRetries = 4

   output_format = "HDF"
   }
secrets = {credential1 = "secret_reference1", credential2 = "secret_reference2", credential3 = "secret_reference3"}

[some_section]
secrets = {credential1 = "secret_reference1", credential2 = "secret_reference2", credential3 = "secret_reference3"}

[some_section.api_params]
credential1 = "val1"
credential2 = "val2"
credential3 = "val3"

connectionTimeout =1
connectionRetries = 2
queryTimeout = 3
queryRetries = 4

output_format = "HDF"

Or if we want to stick to not using inline tables:

[some_section.api_params]
credential1 = "val1"
credential2 = "val2"
credential3 = "val3"

connectionTimeout =1
connectionRetries = 2
queryTimeout = 3
queryRetries = 4

output_format = "HDF"

[some_section.secrets]
credential1 = "secret_reference1"
credential2 = "secret_reference2"
credential3 = "secret_reference3"

Those all have to change quite a bit more purely in terms of the character deltas

pradyunsg commented 3 years ago

@nugend None of those examples are something that we're designing for. Something like the following is exactly what I don't want to enable:

[some_section]
api_params = {
   connectionTimeout = 1, connectionRetries = 2, queryTimeout = 3, queryRetries =4,
   output_format = "HDF",
   }
secrets = {credential1 = "secret_reference1", credential2 = "secret_reference2", credential3 = "secret_reference3"}

As you noted, it can be broken out into:

[some_section.api_params]
credential1 = "val1"
credential2 = "val2"
credential3 = "val3"

connectionTimeout =1
connectionRetries = 2
queryTimeout = 3
queryRetries = 4

output_format = "HDF"

[some_section.secrets]
credential1 = "secret_reference1"
credential2 = "secret_reference2"
credential3 = "secret_reference3"

I find the latter much clearer, and I don't see what the advantage is to squashing something like this inline. The fact that TOML forces you to break up long inline tables into separate lines is good IMO -- it forces you to not jam everything on the same line.

nugend commented 3 years ago

Sorry, I didn't see a design document. Is it in another repository?

pradyunsg commented 3 years ago

@nugend what kind of design document are you thinking of?

nugend commented 3 years ago

One that outlines what the design goals for TOML are and what exactly the values are that are trying to be met.

I mean, it's the "Obvious, Minimal Language", so it seemed to me like the design should be "obvious". So when inline tables are allowed, but they can't span newlines, that doesn't really feel obvious (to me at least). So if there's a specific design spec that's rather preferred, I'd like to read it so I know what is actually being aimed for.

pradyunsg commented 3 years ago

Nothing like that exists. Between how this project is structured (BDFL-esque) and how the discussions take place, your best bet for figuring out the design approach here is to look at the issue tracker for what I or Tom have said.

pradyunsg commented 3 years ago

To me, the problem to solve here is that there's some cases where it's not possible to "nicely" spread the information. :)

dependencies = [
    "awesome-project",
    { git = "git@github.com:thatTiredPerson/importantButAbandonedProject", revision = "bae5084d627eb4cdabe07c1e743e3d97425d7f5c" },
    "my-awesome-documentation-thing",
]

[hello.deep.nested.context]
one = 1
two.three = 2.3
four = { five = 5, six = 6, seven = 7, eight = 8, nine = 9 }

I'm unconvinced right now about which approach is better here, or if we should just say "yea, that sort of complexity won't work".

eksortso commented 3 years ago

To me, the problem to solve here is that there's some cases where it's not possible to "nicely" spread the information. :)

@pradyunsg The impetus behind #525/#551 was precisely to address the problem with the inline table in your first example. I was wanting to allow a variant of the following, and as things turn out, this particular approach would work well with small anonymous tables in the middle of arrays:

 # FYI, not valid as of v1.0.0
dependencies = [
    "awesome-project",
    {
        git = "git@github.com:thatTiredPerson/importantButAbandonedProject"
        revision = "bae5084d627eb4cdabe07c1e743e3d97425d7f5c"
    },
    "my-awesome-documentation-thing",
]

[...]

I'm unconvinced right now about which approach is better here, or if we should just say "yea, that sort of complexity won't work".

I'm a little confused by what differentiates the two approaches you've presented. Could you state the same content in the two different styles, so we can compare apples to apples?

pradyunsg commented 3 years ago

The approaches I'm referring to leaning into the [table] syntax vs leaning into the inline table syntax.

I'm on mobile, so typing this out will take way longer than useful to type out the existing examples in those formats. 😅

FWIW, regarding #525, if we do go that route (I don't know yet), I think I'd prefer to enforce a "comma-followed-by-newline-for-all-keys" approach there, rather than newline is separator. That way, if you wanna break out your inline table across multiple lines, you don't need to remove the commas and only add additional newlines.

four = { five = 5, six = 6, seven = 7, eight = 8, nine = 9 }
# OR
four = {
   five = 5,
   six = 6,
   seven = 7,
   eight = 8,
   nine = 9,  # trailing comma is optional
}

# NOT THIS THO
four = {
    five = 5, six = 6, seven = 7,
    eight = 8, nine = 9
}

But yea, all of these rules feel more convoluted than "you can't do that, sorry". :(

rezamahdi commented 3 years ago

It seems with this approaches, TOML is becoming more close to Yaml.

I think allowing multi line tables, will fix many issues. In fact, a json like syntax, without quote for keys...

On Wed, Apr 21, 2021, 01:23 Pradyun Gedam @.***> wrote:

The approaches I'm referring to leaning into the [table] syntax vs leaning into the inline table syntax.

I'm on mobile, so typing this out will take way longer than useful to type out the existing examples in those formats. 😅

FWIW, regarding #525 https://github.com/toml-lang/toml/issues/525, if we do go that route (I don't know yet), I think I'd prefer to enforce a "comma-followed-by-newline-for-all-keys" approach there, rather than newline is separator. That way, if you wanna break out your inline table across multiple lines, you don't need to remove the commas and only add additional newlines.

four = { five = 5, six = 6, seven = 7, eight = 8, nine = 9 }

OR

four = {

five = 5,

six = 6,

seven = 7,

eight = 8,

nine = 9, # trailing comma is optional

}

NOT THIS THO

four = {
five = 5, six = 6, seven = 7,

eight = 8, nine = 9
}

But yea, both of these feel more convoluted than "you can't do that, sorry". :(

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/toml-lang/toml/issues/781#issuecomment-823593439, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLQTBXIV32QQRNFGWRSSTLTJXSTXANCNFSM4S7IQOFA .

eksortso commented 3 years ago

@pradyunsg Enforced style rules, I'm discovering, are inherently convoluted. But if something new for the standard would serve a good and valuable purpose, I'd rather have some way to do it, however ugly it is, than not have a way to do it. I'll concede on the commas thing.

(The ugliness could be a virtue in disguise, though. My cynical and provincial take is, "If it's too ugly for yinz to use, then maybe yinz oughta just do it the right way!" I have the exact same feeling about NULL values and not having NULL values in TOML.)

I'm sure the folks who pressed for inline tables in the first place felt the same internal conflicts as we're having here. I've not dived that deep into those past conversations.

@rezamahdi Close to YAML? Them's fighten' words! :) We need to hear this stuff sometimes though. It keeps us honest.

rezamahdi commented 3 years ago

😂 The fact is that TOML's destination never had to support 100-times depth or be a serialization format for complex hierarchy and objects. It is a simple configuration file format. But it had always been compared to Yaml....

nugend commented 3 years ago

I have the exact same feeling about NULL values and not having NULL values in TOML

While it's great that TOML doesn't have an explicit null. Most programmatic handling of TOML has implicit nullity for all values outside of arrays. So, bit of a mixed bag there in a sense. Not that I think you could do better without requiring a schema, which is not something I think would be universally workable for TOML. Ironically, the requirement of a schema makes null eminently tolerable as a syntactic convenience for expressing optional datatypes.

"If it's too ugly for yinz to use, then maybe yinz oughta just do it the right way!"

This is the conundrum. TOML is the right and only way to do things in at least some cases as it's gotten standardization for various purposes (Rust Cargo, Python's new packaging files). But we all have differing opinions on what is ugly.

Personally, I feel that indenting TOML blocks beyond one level for array tables is fairly hideous and extremely hard to read. I don't have to though, since the spec doesn't require that. I don't know about anyone else, but my position could be boiled down to not requiring a certain syntax to express something by allowing for whitespace flexibility. (Well, that and homogenizing the trailing comma situation).

brunoborges commented 3 years ago

Regarding a schema, please visit https://github.com/brunoborges/toml-schema for a proposal.

JeppeKlitgaard commented 3 years ago

I am really impressed with TOML as a configuration language, but the lack of support for newlines in inline tables makes nested structures quite cumbersome in some cases.

I think the INI-style format is great in some or even most circumstances, but not in all circumstances. Presumably that is why inline tables were added in the first place. I think adding support for multiline 'inline tables' (or extended tables, extended inline tables, whatever the name given would be) is an obvious improvement to TOML and would not detract from the INI-style.

The only problem I see with it is that there are suddenly two valid ways of the same thing, which might lead to inconsistent use. This is however already the case for inline tables - sane people choose the appropriate one. The distinction between good and bad use of the extended inline table might be less clear, but I think people can be trusted to use the appropriate one.

The extended inline table is, in my opinion, sufficiently useful to warrant implementation even if it diverges slightly from the 'only one obvious way' approach. TOML is a small, succinct language and the addition of extended inline tables will not change that.

I think requiring comma separation and allowing trailing commas would make it very clear that it is in fact just an inline table that stretches multiple lines - nothing more. Allowing newline separation alone adds too much 'magic' to the language.

Further, I think something like

four = {
    five = 5, six = 6, seven = 7,
    eight = 8, nine = 9,
}

Should be supported. It might be slightly ugly, but if commas are to be the separator, it 'feels' like this should be a valid input. This again reduces 'magic' and puts an appropriate amount of styling responsibility on the user.

Additionally some users might want to group certain keys:

credentials = {
    admin_name = "admin", admin_password = "very secret",
    editor_name = "editor", editor_password = "also quite secret",
}

The argument shouldn't be whether this is appropriate for a top-level table, where

[credentials]
admin_name = "admin"
admin_password = "very secret"
editor_name = "editor"
editor_password = "also quite secret"

Is the preferably choice, but suppose that the credentials table was nested below the top level once, twice or more. This might for example be the case for a configuration file containing a top-level table clusters of named clusters, each with some servers that all needed a credentials table.

Just to get the point across, please see this discussion where YAML is chosen over TOML because of the table format: https://github.com/getnikola/nikola/pull/2821

abravalheri commented 2 years ago

Hi @pradyunsg, I just would like to note here that, when adding Array of Tables to the discussion, the nesting in TOML is very confusing...

For example, if I try to write something similar to the following JSON:

{
    "aot": [
        {"table1": {"a": 1, "b": 2}},
        {"table2": {"c": 3, "d": 4}},
    ]
}

(and I want to do it without inline-tables - which would be OK for this compact example, but problematic in the case table1 and table1 where complex) I would not be able to figure out how to do it myself, and need to rely on an automatic tool.

[[aot]]

[aot.table1]
# ^--- this part is confusing... why single brackets? `table1` is not a direct child of `aot`...
a = 1
b = 2

[[aot]]

[aot.table2]
# ^--- same as the previous
c = 3
d = 4

I think the difficult bit to understand here is that currently, nesting of tables/AoTs in the "INI-style" form is not-"stateless" (could I say context-free?)... The table headers kind of work more as an "insert command" than a "standalone", 1-to-1, perfectly reversible, mapping between the data structure and the representation...

In the example above, to "parse" [aot.table1] and [aot.table2] correctly, we have to think in TOML less as a markup and more as a series of instructions that depend on the "previous state of the parser"/stack, which is much more difficult for people to understand.

I believe JSON-style tables, would make the representation much more understandable and "obvious" (to quote TOML's acronym).

mcarans commented 2 years ago

@pradyunsg Yes! I would very much welcome a clearer way of nesting. I found it confusing to write the following to deal with "pages" and "children" for pydoc-markdown in TOML and confess that I struggled to get the syntax right:

[tool.pydoc-markdown.renderer]
type = "mkdocs"

[tool.pydoc-markdown.renderer.mkdocs_config]
site_name = "HDX Python Scraper"

[[tool.pydoc-markdown.renderer.pages]]
title = "Home"

[[tool.pydoc-markdown.renderer.pages]]
title = "API Documentation"

[[tool.pydoc-markdown.renderer.pages.children]]
title = "Source Readers"
contents = ["hdx.scraper.readers.*"]

[[tool.pydoc-markdown.renderer.pages.children]]
title = "Outputs"
contents = ["hdx.scraper.jsonoutput.*", "hdx.scraper.googlesheets.*", "hdx.scraper.exceloutput.*"]

I think this is much clearer in YAML:

tool:
  pydoc-markdown:
    renderer:
      type: mkdocs
      mkdocs_config:
        site_name: HDX Python Scraper
      pages:
        - title: Home
        - title: API Documentation
          children:
            - title: Source Readers
              contents:
                - hdx.scraper.readers.*
            - title: Outputs
              contents:
                - hdx.scraper.jsonoutput.*
                - hdx.scraper.googlesheets.*
                - hdx.scraper.exceloutput.*

Nesting is one reason why developers of some commonly used libraries like asottile (author of flake8 and pre-commit) refuse to support pyproject.toml. For example here, he says "you also don't actually want this, toml is really bad at representing nested structures" (and he goes on to give an example of what a pre-commit configuration would look like in TOML).

The author of the library PyTOML turned down a request to use his library as a dependency for pip as part of PEP-518 saying here that "TOML is a bad file format. It looks good at first glance, and for really really trivial things it is probably good. But once I started using it and the configuration schema became more complex, I found the syntax ugly and hard to read...I personally abandoned TOML and as such, I'm not planning on adding any new features..." Presumably he was talking about nesting.

I think fixing nesting is important and likely to speed adoption of TOML in general and pyproject.toml in particular.

mcarans commented 2 years ago

Perhaps exploring different nesting options could be a Google Summer of Code project (deadline is 21 Feb)?

pradyunsg commented 2 years ago

This isn't suitable for a GSoC project for various reasons:

The thing needed here is a design decision, not implementation work.
The "work" needed here is not the right size for a GSoC project.
I don't think anyone on the TOML core team has the availability for mentoring someone to work on this.

mcarans commented 2 years ago

The thing needed here is a design decision, not implementation work.

I can see that there have been long running discussions in different GitHub issues on various approaches. Who makes the final design decision?

pradyunsg commented 2 years ago

See https://github.com/toml-lang/toml/issues/781#issuecomment-821776335. It's a BDFL-esque model -- I make the call, although I personally view that as delegated power from the original author whose name is in the format's name. :)

pradyunsg commented 2 years ago

Additionally some users might want to group certain keys:

credentials = {
    admin_name = "admin", admin_password = "very secret",
    editor_name = "editor", editor_password = "also quite secret",
}

Well, I don't see why this is more preferable to a clearer model:

[credentials]
admin = { name = "admin", password = "..." }
editor = { name = "admin", password = "..." }

I view it as a good thing that such grouping is forced into tables (ending up in the final parsed mapping) rather than being a source-only artifact.

Just to get the point across, please see this discussion where YAML is chosen over TOML because of the table format: getnikola/nikola#2821

Yea, I don't really care if the criticism is a mere "it sucks" with no clear elaboration on why they think that. There's nothing actionable there. :)

pradyunsg commented 2 years ago

[tool.pydoc-markdown.renderer]

This is probably a reasonable example for a nesting-oriented configuration, so... I'm gonna pick this and run with it.

Already possible today:

[tool.pydoc-markdown.renderer]
type = "mkdocs"
mkdocs_config = { site_name = "HDX Python Scraper" }

pages = [
  { title = "Home" },
  { title = "API Documentation", children = [
    { title = "Source Readers", contents = [
      "hdx.scraper.readers.*"
    ] },
    { title = "Outputs", contents = [
      "hdx.scraper.jsonoutput.*",
      "hdx.scraper.googlesheets.*",
      "hdx.scraper.exceloutput.*"
    ] },
  ] },
]

It's still a somewhat convoluted mess; IMO. That you didn't try to use this is... reasonable. :)

I don't think the shorthand-for-table-name syntax is much of an improvement over the convoluted array-of-tables mechanism to do these things.

[tool.pydoc-markdown.renderer]
type = "mkdocs"

[*.*.*.mkdocs_config]
site_name = "HDX Python Scraper"

[[*.*.*.pages]]
title = "Home"

[[*.*.*.pages]]
title = "API Documentation"

[[*.*.*.*.children]]
title = "Source Readers"
contents = ["hdx.scraper.readers.*"]

[[*.*.*.*.children]]
title = "Outputs"
contents = ["hdx.scraper.jsonoutput.*", "hdx.scraper.googlesheets.*", "hdx.scraper.exceloutput.*"]

If we do the all-on-newlines model, we'll allow something like:

[tool.pydoc-markdown.renderer]
type = "mkdocs"
mkdocs_config = {
  site_name = "HDX Python Scraper"
}

pages = [
  { title = "Home" },
  {
    title = "API Documentation",
    children = [
      {
        title = "Source Readers",
        contents = [
          "hdx.scraper.readers.*",
        ]
      },
      {
        title = "Outputs",
        contents = [
          "hdx.scraper.jsonoutput.*",
          "hdx.scraper.googlesheets.*",
          "hdx.scraper.exceloutput.*",
        ]
      },
    ]
  },
]

This ends up extremely close to JSON, which is not a bad thing because a lot of the familiarity will come with that. This also brings with it the concerns around formatting cleanly; since the following would be valid as well:

[tool.pydoc-markdown.renderer]
type = "mkdocs"
mkdocs_config = { site_name = "HDX Python Scraper" }

pages = [
{ title = "Home" },
{title = "API Documentation",
children = [
{
title = "Source Readers",
contents = [
"hdx.scraper.readers.*",
]
},
{ 
title = "Outputs",
contents = [
"hdx.scraper.jsonoutput.*",
"hdx.scraper.googlesheets.*",
"hdx.scraper.exceloutput.*",
]
},
]
},
]

This means, to me, that there's a greater need for a TOML formatter along the lines of prettier/rustfmt/gofmt/black, which... feels inherent with allowing this sort of nesting though.

However, this also means that someone can do stupid things like:

tool = { pydoc-markdown.renderer = {
  type = "mkdocs",
  mkdocs_config = {
    site_name = "HDX Python Scraper"
  },
  pages = [
    { title = "Home" },
    {
      title = "API Documentation",
      children = [
        {
          title = "Source Readers",
          contents = [
            "hdx.scraper.readers.*",
          ]
        },
        {
          title = "Outputs",
          contents = [
            "hdx.scraper.jsonoutput.*",
            "hdx.scraper.googlesheets.*",
            "hdx.scraper.exceloutput.*",
          ]
        },
      ]
    },
  ]
}}

But that same blurb with all newlines removed (which is arguably more "stupid") is already valid; so... whatever. That isn't really a negative point and the ship has probably sailed on "only one obvious way to do things" had sailed a decent time ago anyway. :(

This does make me reconsider the comma-newline as separator though, since merely using newline as a separator would make it so that you could just take an existing table definition and wrap that with {} to get a layer of nesting. I can't decide which one of those semantics is desirable -- being closer to the existing semantics of commas-serve-as-separators vs being closer to the existing semantics of newline-after-value-means-you're-defining-the-next-thing.

pradyunsg commented 2 years ago

Alright, I've played around with all the proposals+examples here in a text editor, and I think I have the answers I was looking for from this issue.

Does TOML need to make nesting easier?

Yes.

What should that look like?

JSON-like. That's closer to existing mental model of most users, and also the sort of structure that users are trying to put into configuration files.

Let's hash out the details for this in #516. :)

Thanks everyone for the discussion and comments here! They're appreciated! ^.^

dirkroorda commented 10 months ago

I have read this with interest. I'm trained as a mathematician, and hence I find both yaml and json more obvious than toml. To me toml is a bit like a premature optimization. At some point, all languages allow expressions that are too long or too deeply nested to be truly human readable.

The best solutions for those cases is not to adapt the language, but to apply formatters (like black, prettier). If I need to read a json file of 1 MB on one line, I simply convert it to yaml and start reading. Or if I need to jump from the beginning { of a sub-object to its ending }, I convert the json to an indented json.

All in all, yaml gives me the best reading experience, and I never use its confusing parts.

I was confronted with toml because of distributing python modules, and, without reading the docs, I kept getting confused.

So, tooling to go quickly from yaml to json inside the editor, or a side-by-side view would offer me the best of 3 worlds: clutter-free yaml, brace-rich json, and conceptual clarity.

eksortso commented 10 months ago

I have read this with interest. I'm trained as a mathematician, and hence I find both yaml and json more obvious than toml. To me toml is a bit like a premature optimization...

@dirkroorda Thank you for confronting us with your feedback. I was trained as a mathematician as well. I'm familiar with the mindset. And my experience with other deeply mathematical models, like those used by relational databases, convinced me that, ultimately, practicality beats purity when solving real-world problems. This is why TOML has been adopted by Rust and Python's packaging systems, though you can use whichever format you deem to be best for your needs.

TOML comes from INI's informal roots. INI formats tended to be hierarchical and string-based when they ventured into table nesting. So addressing deep nesting required some new thinking on our part. And as most younger professionals these days are more familiar with JSON's braces than with INI's table sections, it made sense for us to tap into ideas from JSON to some degree. It's something I fought against for some time, since we're primarily a configuration format. But as hinted at before, it's better to be practical and useful than to be stuck with a hierarchical format that doesn't scale well in places where deep nesting is unavoidable.

The best solutions for those cases is not to adapt the language, but to apply formatters (like black, prettier)...

Formatters do exist for TOML, as do emitters. I don't use them, even though I use TOML for all of my Python projects. But my editors provide syntax highlighting and such to make reading and editing TOML simple. Something which, I find, both JSON and YAML are ill-suited to provide, being respectively too simple and too complex for effective configuration management.

But, your mileage may vary, as they say. Thank you for sharing your thoughts.

dirkroorda commented 10 months ago

In return, I appreciate your emphasis on practicality, @eksortso . To me, that is the final concern too. And that depends on the actual practices we are engaged in.

toml-lang / toml