Consider removing all the "valid but discouraged" commentary from the spec

JoeUX commented 4 years ago

Ahoy. There are nine instances of the word "discouraged" in the spec, spanning at least four topics / injunctions. For me, they're a noteworthy distraction. I suggest removing all of them and just having the spec express its opinions in its specification, not in commentary.

One instance is about "out of order" keys. What do we care how someone orders their keys? Keep in mind that the spec for a markup or config language can only reach so far into space-time. These are text files meant to be consumed by humans and machines. How they're consumed by exogenous systems might ultimately matter more than a specification around their formatting. (This insight is probably more relevant to the purported type system, which in a real sense is unenforceable since these are just text files and systems that consume TOML will have to map to their native types.)

Another instance offers no explanation. It's this bit about integers:

int5 = 1_000
int6 = 5_349_221
int7 = 1_2_3_4_5     # VALID but discouraged

What do we care? I wouldn't allow underscores in integers to begin with, but if the spec allows them between each digit, what do we care if someone uses them thusly? Further, it's confusing to say "discouraged" without any explanation. I suggest that the spec either mandate the usual grouping pattern, as in int5 and int6, or continue to allow any pattern, without commentary.

In general, I'm suggesting that the spec just behave like a spec, absent all these "discouraged" asides. A mild reason for this would be to keep the spec as short and clear as possible, balanced against other goals. A major reason for this is a philosophy of baking any opinions into the spec itself, rather than in distracting asides and commentary.

ChristianSi commented 4 years ago

I'm against that. In general, the spec uses "discouraged" for things that are bad style and hence better avoided. Trying to outlaw bad style is a huge and, in general, hopeless enterprise. It's better to just give recommendations on good style or good practice as opposed to bad style or bad practice, and "discourage" the latter – which is what the spec does. Also, turning any of these discouragements into prohibitions is not possible before TOML 2.0, since it would break compatibility.

abelbraaksma commented 4 years ago

I kinda like these notes in the spec, though a future edition might have a style guide or "do's and don'ts" section, so that this is in one location.

About the integers, that's perhaps the one place where it's a bit misguided to have said comment. In several countries it's, for instance, common to use digit grouping by two (India, if I'm not mistaken). And sometimes sensible grouping is just logical, but irregular, for instance with guids (but 128 bits integers are not supported out of the box, so this is a bit of a non issue now), phone numbers, or social security numbers.

abelbraaksma commented 4 years ago

Actually, Indian uses a mix, see https://en.m.wikipedia.org/wiki/Decimal_separator#Digit_grouping. And in other Asian countries apparently using myriads is not uncommon.

For instance, 10_00_000 would be 1 million in India.

It's important that TOML embraces the international community by being inclusive (see also the discussion on Unicode literals), and thus not limiting this type of formatting. I believe that to be a good thing, though anybody can always adopt a different style guide,or create a parser that's less inclusive if they so choose.

JoeUX commented 4 years ago

Good point about India. Cultural differences are a strong argument against discouraging different forms and preferences.

I do like the Go formatting standard though, so that source always looks the same and is more easily compared. Not sure how to apply that here though. I'm also surprised that TOML key names have to be ASCII. Seems like an arbitrary and dated preference.

abelbraaksma commented 4 years ago

Seems like an arbitrary and dated preference.

Not arbitrary, but certainly dated. That's why I mentioned the literals (and with it, key names) discussion, which is precisely about that. Just forgot where it was, but it'll be one of the first things to go in vNext after 1.0 is final and some have already adopted it in their parsers.

I do like the Go formatting standard though

Some love it, others hate it, just witnessed a long discussion between the two groups. I guess it's a personal preference whether you prefer a language that's overly strict or not. But don't confuse TOML with a programming language, it's not ;).

marzer commented 4 years ago

I'm also surprised that TOML key names have to be ASCII. Seems like an arbitrary and dated preference.

Proposal to rectify that post-1.0: https://github.com/toml-lang/toml/issues/687

ChristianSi commented 4 years ago

@abelbraaksma's comment on digit grouping certainly is a good argument for removing this one discouragement from the spec:

int7 = 1_2_3_4_5     # VALID but discouraged

We could instead replace it with a more appropriate example of how underscores in numbers might reasonably be used, say:

int7 = 10_00_000  # Indian-style number grouping

JoeUX commented 4 years ago

@abelbraaksma Got it, thanks.

I don't confuse TOML for a PL, though it's interesting to think about a strict format for a data serialization / markup language. Especially one that has a twin binary form, like Amazon Ion or Protocol Buffers. Text formatting should probably just be automated anyway.

pradyunsg commented 4 years ago

The TOML specification is NOT purely a implementation-oriented specification -- it's certainly meant to be read by someone who writes a TOML file, and these comments are directed toward pushing for a certain degree of consistency.

I don't think we'll dropping any of the notices about "discouraged" forms. I view them as being useful guiderails and none of them are bad suggestions. I'm sure there could be some disagreements, but those are inherent in discussions about "poor form" / style.

That said, there are 2 actionable items in this discussion that I'll file follow-up issues for:

adding 1_00_00_000 as an example for Indian style number separation, in integers.
changing "strongly discouraged" to "not possible" (for breaking inline tables across multiple lines)

pradyunsg commented 4 years ago

I'm going to go ahead and close this issue now, since I don't think we should remove the commentary. None the less, thanks for filing this issue @JoeUX.

Also, thanks everyone who's participated in this discussion. ^>^

abelbraaksma commented 4 years ago

@pradyunsg, thanks for making the changes, I totally agree with your motivation :).

toml-lang / toml

Consider removing all the "valid but discouraged" commentary from the spec #749