Closed pocomane closed 5 years ago
In that case, what you do propose to do if your file happens to start with '
(or ''
etc.)?
Also, note that your formula
Continue to add a surrounding ' until we get no syntax error.
is very dangerous. If your external file has been written by a malicious (or just facetious) author, they could insert all kinds of unexpected values (including full tables etc.) into your document -- the old '; DROP TABLE trick (admittedly you cannot drop stuff in TOML, but just arbitrarily inserting it is hardly better).
Also, "Control characters other than tab are not permitted in a literal string", so trying to paste any file might cause breakage in any case (even assuming you can be sure that it's UTF-8).
In that case, what you do propose to do if your file happens to start with
'
(or''
etc.)?
Actually the starting '
is not a problem, since you can insert a new line that will be trimmed. However your objection is still valid for an ending '
.
I can imagine some solution (e.g. trimming also a final newline or allowing a longer '
sequence at end or a new kind of delimiter), but the point, for me, is the possibility to paste in a verbatim doc. The exactly syntax is a subsequent problem.
is very dangerous. If your external file has been written by a malicious (or just facetious) author, they could insert all kinds of unexpected values (including full tables etc.) into your document -- the old '; DROP TABLE trick (admittedly you cannot drop stuff in TOML, but just arbitrarily inserting it is hardly better).
Well, if security is an issue, you should not copy and paste third part code without validation (or just counting the longest '
sequence in the data). In other cases, a trial and error process can be useful.
Also, "Control characters other than tab are not permitted in a literal string", so trying to paste any file might cause breakage in any case (even assuming you can be sure that it's UTF-8).
Ok, we will not be able to insert binary code, but any human readable UTF8 document is a good result! (to be honest, I do not see the point of forbid the control characters, but this is another dicussion).
@pocomane:
Actually the starting
'
is not a problem, since you can insert a new line that will be trimmed.
Good point, didn't think of that! (Though that trick should be documented if your proposal is adopted).
However your objection is still valid for an ending
'
.
No, it's not (and was never meant to be). Already as of now, if a literal string starts with 3 ticks and ends with 4 or 5, a bugfree parser should recognize that that's not a syntax error, but that the first one or two ticks are part of the value.
# Both values should be equivalent
key1 = '''This string ends with a 'quoted expression.''''
key2 = "This string ends with a 'quoted expression.'"
Well, if security is an issue, you should not copy and paste third part code without validation (or just counting the longest
'
sequence in the data).
Just counting the longest '
sequence and then adding one more before and after the string should indeed be sufficient to perform proper validation (provided that encoding issues and control characters have been taken care off and you insert an initial newline if needed).
Considering that you've refuted my objection, I now think that your proposal might be a useful addition to TOML.
Why not use well know and perfectly working widely used syntax
foo = << MY_MARK_OF_END_OF_HERE_DOC_WHAT_EVER_I_WANT
My
multiline
text
MY_MARK_OF_END_OF_HERE_DOC_WHAT_EVER_I_WANT
One of TOML's core principles is that all string values are unambiguously quoted. The multi-line code block syntax used in Markdown on GitHub and the triple quotes used in Python were influential in this regard, moreso than Heredoc syntaxes (especially YAML's many many varieties).
@LongTengDao proposed an odd number of quotes restriction in #623.
Other than my usual concerns about the lack of past precedent (for the quotes-based idea), I also do not consider the motivating use case to support doing this, to be a good idea in the first place.
TOML is a configuration file format, for humans to edit. Needing the ability to include arbitrary UTF-8 strings (or whole files) with control characters, is not something common for configuration situations. I'm not inclined to add to the language for a use case that I don't think is a good idea.
In fact, as you mentioned, you can do it pretty well already within TOML. I'm okay with not being able have arbitrary literal strings with triple single quotes in them. That's not a pain point for 99% TOML users (and adding to the string syntax would be).
That said, I'm open to hearing more about the exact use cases for doing this, and why something else (like having a separate file, named in the TOML document) is not sufficient or workable.
And I should note that I'm typing this on my phone at almost 1am, so thanks for being understanding if my language doesn't make sense, if the tone is too pushy or if there are tyops.
The proposal is not about control characters. The proposal is about making more simple to insert sub-documents. It should improve the readability and mantainability of such configuration.
Keeping sub documents in other files, and just referring to them in the TOML has some disadvantages (e.g. make sure all the references are correct).
As use case, please consider the following. We keep the configuration of our software in a SQL DB. To avoid unnecessary complexity, we use one row per setup, so we do not need to make multiple query, join stuff, or whatever.
What we have to do right now to support this scenario, is to have an UI to modify the configuration, and to wrap it in a TOML string before store it in the DB. It would be instead quite usefull to be able to read and modify the file directly, without a specific application. With the proposal, you can keep in the DB something like:
[application_a]
path = "config/appa.json"
content = ''''''
{
"myregex":"'''.*'''"
}
''''''
[application_b]
path = "config/appb.toml"
content = ''''''
[menu]
title = "my"
description = '''
bla bla bla
bla bla bla
'''
''''''
It is quite simple to expand this kind of file in multiple configuration files. It does not needs to know anything about the configuration format, neither it imposes a "Configuration wrapping layer".
Again, if you want a different syntax can be chosen. Here the point is the ability to paste verbatim documents keeping their readability.
The proposal is about making more simple to insert sub-documents.
Yea, I noticed. 😉
Keeping sub documents in other files, and just referring to them in the TOML has some disadvantages (e.g. make sure all the references are correct).
When you have a situation where you need to deal with multiple units of complexity, you also have to accept a certain amount of complexity somewhere in the system. For the use-cases put forward, I don't think TOML should have this complexity and that the applications should handle it.
[application_b] path = "config/appb.toml" content = '''''' [menu] title = "my" description = ''' bla bla bla bla bla bla ''' ''''''
What I'm saying is being able to include "sub-documents" like above is not a use-case I view as being important enough to complicate TOML's string literals. I'm okay with TOML not "stretching" to support it.
To put it in a more blunt manner, if the use case involves iincluding arbitrary documents in a single file, you should either use a different file format than TOML or need to be willing/able to add a certain amount of complexity in the application.
Other than my usual concerns about the lack of past precedent (for the quotes-based idea),
Sorry, I missed this before. What do you mean for "Lack of past precedent"? Heredoc are used in other places. String separators of variable length are used in other places. Maybe they are never been used with ticks and in a INI-like configuration file (not in a widespread file format at least). But, is this the point?
What I'm saying is being able to include "sub-documents" like above is not a use-case I view as being important enough to complicate TOML's string literals.
The idea behind the proposal is that I fount it to be a small syntax change. I agree with you if we speak about a common Heredoc syntax that is not very TOML-ish. But I do not agree about the starting ticks. I found that the gain in flexibility exceeds considerably the (very small) increase of complexity.
I think also that the syntax is quite natural: TOML already supports closing a string with any number of ticks, why starting a string should be different?
if the use case involves iincluding arbitrary documents in a single file, you should either use a different file format than TOML or need to be willing/able to add a certain amount of complexity in the application.
In fact at the end I opted for an ad-hoc format that I maintain by myself. Very similar to TOML, but that let me to easily paste sub-documents.
However, that was just a real use case that happened to me, but the general point is to put weird values in TOML, keeping their readability and maintainability. I am just comparing things like code = "'''\"\"\"\n"
to
code = ''''
'''"""
''''
What do you mean for "Lack of past precedent"?
I'm not aware of a major programming language that uses this kind of quoting rules for strings.
The idea behind the proposal is that I fount it to be a small syntax change.
This is the core of our disagreement here and I don't view this as a small syntax change. I view this, in essence, as a new string syntax that no other common programming language uses.
I don't want to introduce different string syntax rules in TOML, for enabling a use case (including arbitrary documents inline) that I don't think justifies the cognitive cost of introducing syntax, that is different from most of the common programming languages.
While I'm open to hearing from you about this, I'm also going to close this issue to signal that this is in all likelihood not going to be a change made to TOML.
(I accidentally posted early, so those following by email will want to see my edits; sorry!)
While I'm open to hearing from you about this,
The use case is important enough to make at least two widespread languages to handle it: posix shell and lua.
I think lua is interesting: it has a string quoting mechanism with variable size separator, that is the core of my proposal for TOML. However, I do not think that the exact lua syntax would be a good match for TOML, at least because it will end to have two way to do exactly the same things: a = '''string'''
and a = [=[string]=]
or a = '''wierd]=]string'''
and a = [==[wierd]=]string]==]
.
I would also be happy with a shell-like syntax (proposed by puchi above), if it can help to get a wider agreement on the sbject.
I would like a way to copy/paste an extern document in a toml file (as a string value). The multi-line literal string works well except for documents containing a triple tick ('). Right now when a document contains this pattern, also a single time, we need to switch to the multi-line quoted string syntax, and to escape ALL the special characters.
So a single presence of ''' forces us to a long, tedious, error-prone process or to use an utility tool. Moreover the result is not immediate to read, since you have to mentally unascape the special characters.
Something like this was already asked in https://github.com/toml-lang/toml/issues/577 and https://github.com/toml-lang/toml/issues/521 . I think the letter contains a simple and flexible solution: any number of tick (greater than 2) starts the multi-line literal string; the string ends when the same number of consecutive tick is found.
So, to paste ANY document in a toml file, we would just have to: