Proposal: RFC-like extensions and extension system

Arastais commented 1 year ago

I've been following toml for a while and noticed that many features get heavily debated (relatively speaking) or even outright rejected for an assortment of reasons. Generally speaking, it's because the feature or proposal, while a nice addition, is either not necessary (and therefore would ruin the simplistic nature of toml) or is too ambiguous (and would therefore confuse users if they don't know a new syntax/feature has been implemented and its quirks) - sometimes both. While I agree with the reasoning for rejections and see the merit in keeping toml relatively simple, I also would like to see these features optionally implemented.

Thus, I think it would be best if toml had an extension system, whose naming and overall functionality would be similar to that of XMPP - another open and RFC standard. Essentially, features that may not be suitable for the base toml standard itself can instead be amended as an extension with a specific "ID", such as TLX-YYY (where YYY is the number of the extension). Users can then specify these extensions in their toml files if they want to use the features in that extension. Parsers would then optionally implement extensions that become a part of toml

For simplicity and as an example, let's assume this gets implemented as part of the toml 1.1.0 standard. I will also use the term "parser" generally, but I'm mostly referring to implementations. Which extensions 1.1 compliant parsers actually implement is up to the developer of said parser, since extensions are optional. However, in order for parsers to be 1.1 complant, they should list what extensions they support (much like how toml parsers say what version of toml they are compliant to in their readme), and throw an error if the user indicates they want to use an extension that the parser does not support. An error should also be thrown if a user is using a feature that is part of an extension but did not specify in their file that they are using that extension; This way, there will be no ambiguity and users will be clearly told what is going on in terms of the extension system.

Users would indicate what extensions they want to use in the file itself, like so:

[toml]
extensions = [1, 3, 4, 7]

A 1.1 compliant parser would read this, and either accept it, or throw an error like my-example-parser does not support TLX-003 if it did not support (i.e. the authors did not implement) TLX-003. Similarly, the parser should throw an error if it reads a feature that it supports but the extension for that feature has not been specified; For example, if Filesizes are part of TLX-003, and a user tries to use filesizes in their toml file without specifying extensions = [3], the parser should throw an error like File sizes requires extension TLX-003 or File sizes are not part of the 1.1 toml specification; You must specify TLX-003. This system prevents parser developers from being forced to implement all these features, and also prevents users from being forced to use these features when they didn't specify that they wanted to.

The reason specifying an error is important is that 1.0 and below compliant parsers would just read the extensions field as a normal variable, so it's important that 1.1 and above compliant parsers don't have the same behavior for ambiguity reasons. For similar reasons.1.1 compliant parsers should also specify when they correctly read an extension, e.g. TLX-001 in use.

Additionally, with this syntax, adding a field specifying the version of toml used in the file (mentioned many, many times; #860 highlights them all) and specifying a toml schema (#792) could be naturally implemented:

[toml]
version = { major = 1, minor = 1, rev = 0 }
schema = "<url>"
extensions = [1, 3, 4, 7]

Specifying a toml version is a good idea in general so that parsers can know if the user is using a standard that the parser does not support. It would be an even better idea now in the context of the extension system: When a version is not specified, 1.1 compliant parsers can assume a version of the toml standard is being used that does not support extensions, and thus immediately reject the file if that's the case (since in our example there are no extensions before 1.1). They could also maybe assume that a toml file with no [toml] table specified at all is instead using version 1.0 of the standard, since, in this example, that's the latest version that does not implement having a version field in the toml file.

To better demonstrate how this system and how it could eliminate ambiguity and unnecessary complexity while still allowing users to use new features, here are some issues/prs (and example extension names) that are a great example or fit for this system:

TLX-001: Include Files (#81)
TLX-002: Duration (#514/#717)
TLX-003: Hex Floats (#562)
TLX-004: User-defined Values (#707)
TLX-005: Filesizes (#912)

Many of these are in the "need to be decided" category for 1.1.0 (as seen in #928) for one reason another. With this system, there would be no need to decide: any proposals that may be problematic and need a tough decision can simply just be implemented as an extension instead.

This system may seem to be the same as #707, but there are a few key differences. This system overall is much simplier (users just need to enter what extensions they're using in an array) and actually standardized (since the maintainers of toml define the extensions and parsers implement it). With #707, more commonly used features would have to be replicated by each user many times. For example, if many users wanted to use filesizes in their toml files, each user would have to make their own complex defition for filesizes. However, with this system, they would just instead just add '5' to their extensions field. This system doesn't also necessarily replace #707; Users can still implement their own types for really out of scope or extremely customizes types that are even beyond the scope of an extension and are unlikely to be used by others. This is why I listed User-defined types as an extension above. Obviously, the quirks of this system need to be discussed, but I think that if this system is implemented, it would significantly improve toml overall and set it even more apart from other config languages.

pradyunsg commented 1 year ago

Hi! Thanks for filing this issue and for suggestion!

Ambiguity around whether a feature is implemented or not, as well as making the underlying markup features opt-in is not a desirable property IMO.

None of the issues referenced include major concerns around "Can implementations do this", and are instead discussions about whether the individual features/changes themselves serve the relevant value in adding them to the standard.

Providing an extension point and standard extensions is ~same as having them in the language and the user never using them. If we support a certain extension, all future language changes/extensions also need to avoid conflicting with that feature (unless we want to go down the route of mutually exclusive extensions, which is a bad idea because it means there's ambiguity when you read a TOML snippet about what it means).

Overall, I don't think this is particularly useful for TOML, which aims to be unambiguous and get out of the way of end users so that they can represent the data that they have in clear manner.

Arastais commented 1 year ago

Hi, thanks for the quick response. I've edited my original post for better clarification.

Ambiguity around whether a feature is implemented or not, as well as making the underlying markup features opt-in is not a desirable property IMO.

This system is actually meant to not be ambiguous. Like I said, users must explicitly specify that they want to use an optional feature, so there is no ambiguity between features or toml versions. As it stands, many possible features that have been discussed are ambiguous in nature or too out of scope (such as #81 or #912, both of which have ambiguity as significant reasoning to not implement it), are made less ambiguous if they are explicitly opted into. for one, users would have to read up on how the extension/feature works to use it. For the users who don't use it, they will not have to deal with any repercussions of new features. I also don't see a problem with opting into more "advanced" features that some users might really want but most won't need.

None of the issues referenced include major concerns around "Can implementations do this", and are instead discussions about whether the individual features/changes themselves serve the relevant value in adding them to the standard.

The problem was never really about if implementation can do it, that wasn't the main point. I was just saying that another aspect to consider with adding features that aren't fully necessary is the work implementation authors will have to do. But regardless, the discussions you reference are the whole point of the system. If a feature is debatable whether it will add enough significant value to be worth it (and thus possibly reduce the minimalism of toml if added), then it could be added as an extension instead. This way, toml will still be minimal if a user chooses not to opt-in to these features, but users who do need it can use it.

Providing an extension point and standard extensions is ~same as having them in the language and the user never using them.

I don't see how this is true because the whole point is that any user can choose which extensions they want to have.

If we support a certain extension, all future language changes/extensions also need to avoid conflicting with that feature (unless we want to go down the route of mutually exclusive extensions, which is a bad idea because it means there's ambiguity when you read a TOML snippet about what it means).

Not really. If a newer extension comes out that conflicts with an existing one (which could sometimes be intentional, such as an upgrade/update), the older extension would just become deprecated. This is similar again to how XMPP does it. The parser would also throw a warning/error of such, and tell the user which newer extension to use (e.g. TLX-003 is deprecated; use TLX-005 instead). Any "mutually exclusive" extension would just supersede an older one. I also don't see how there's ambiguity when reading a toml snippet because the extensions used are explicitly declared in the file/snippet.

Hopefully this clarifies some things. I want to convey this idea as clearly as possible, since it is quite complex, so I appreciate the feedback.

marzer commented 1 year ago

Personally I think the fact that TOML does not allow for anything like this is very much a feature. If all implementations must support a new feature, the bar for accepting them into the language is quite high, as not all implementations have access to things you might otherwise expect. For example, a proposal to add an #include directive (or similar) implies all implementations must be able to read directly from disk and/or the internet, which will not be the case for all. By forcing us to take considerations like this into account, the language naturally stays minimal (the "M" in TOML), and thus has maximal implementation surface area.

You might say "well, OK, make that an optional extension", but now you've created a dialect. Dialects are totally fine in human languages, but technical languages should avoid them like the plague, lest you encourage interop/portability issues (or just straight-up confusion).

edit: Actually I've just realized we already have dialects, in that each new version adds new features not compatible with the last (so far, at least). If we continue with the current linear versioning schema and add optional extensions, that introduces a combinatorial explosion of extensions and base TOML versions 😱

pradyunsg commented 1 year ago

Closing since adding extensions like this is not something that TOML is going to be doing.

toml-lang / toml

Proposal: RFC-like extensions and extension system #963