Closed C-Ezra-M closed 1 year ago
When fed to a YAML parser [...]
In general I think where YAML chooses to do one thing, it would be to the betterment of TOML health to choose to explicitly not do that thing. There's just far too much magic in YAML. This is a good example; why add this magic when there's already two good alternatives that don't introduce unexpected behaviour? A double-quoted string would accept your example text verbatim without issue. More complex examples can use a ML literal string.
Edit: also worth noting that this would be a backwards-incompatible change; merely updating the TOML spec version of a parser would change the way a literal string featuring ''
was parsed. There's a lot of TOML out in the world already.
To illustrate @marzer's point, here are two ways to state the string Emma's boyfriend is nice.
in TOML without having to escape the apostrophe.
ParsedString = "Emma's boyfriend is nice."
RawString = '''Emma's boyfriend is nice.'''
The RawString
example uses multi-line literal string syntax (three single-quotes on each side). Within them, you can use apostrophes, or even literal pairs of apostrophes, freely.
The website toml.io even mentions this explicitly in its quick tour:
Since there is no escaping, there is no way to write a single quote inside a literal string enclosed by single quotes. That's where multi-line literal strings come in:
though that site doesn't actually use an example like this. I'll run this past them.
Edit: also worth noting that this would be a backwards-incompatible change; merely updating the TOML spec version of a parser would change the way a literal string featuring '' was parsed. There's a lot of TOML out in the world already.
This isn't a concern since ''
is only parsed inside '..'
-delimited strings; so just ''
is still an empty string. For example in SQL:
(1)=# select '', '''';
?column? │ ?column?
──────────┼──────────
│ '
Not saying I'm in favour of this proposal, just pointing out it is backwards-compatible as far as I can see.
This isn't a concern since '' is only parsed inside '..'-delimited strings; so just '' is still an empty string. For example in SQL:
This is true for single quote strings (since it is currently impossible to form a single-line literal string containing ''
without delimiting the string itself), though my line of thought was w.r.t multi-line literal strings. The sensible assumption would be that this escaping, if adopted, be extended to those as well (otherwise it would introduce an internal inconsistency that would be, er, very dumb), but then that means existing ML literal strings can change meaning:
quotes = '''''''' # '' before this proposal, ' after it
The workaround would be to not extend this magic escaping to ML strings, but then we'd have three different escape modes in the one language. Big YAML vibes.
I assumed that ''
-type escapes wouldn't take effect in triple-quoted strings, as there's no real need for it. (''' ' '''
is already valid).
@Keyacom Have we sufficiently addressed the issue that your proposal was attempting to address? If so, please close this issue. If not, please give us more details to make your case for the proposal.
A summary, so everything about my proposal is clear:
''
(double apostrophe) is embeddable in single-line raw strings, and only works there.'
(single apostrophe) still allowed by itself in multiline raw strings.
Example:
[example]
same-texts = [
'Maya''s boyfriend',
"Maya's boyfriend",
'''Maya's boyfriend''',
]
When reading according to the new specification (Python example):
import tomllib # assume it's updated already
with open("example.toml", "rb") as file: l = tomllib.load(file)['example']['same-texts']
assert all(e == l[i+1] for i, e in enumerate(l[:-1]))
I'd like to reiterate my point from earlier, that with this proposal:
The workaround would be to not extend this magic escaping to ML strings, but then we'd have three different escape modes in the one language.
Currently TOML's literal strings are "what you see is what you get", with no escaping or magic at all, and ideally should stay that way. If we add this, multi-line literal strings would now behave differently to single-line literal strings in one specific case, which is weird.
(you might argue that literal strings do have some magic, the opening newline trimming thing, and you'd be right for some definition of "magic"; I also think that functionality was an unwise addition and wish it weren't in the language but that ship has sailed).
Adding an edge-case or exceptional behaviour to languages is not in-and-of-itself a bad thing, but a higher cognitive/contextual complexity cost must be offset sufficiently by some measure of usability improvement for it to be worthwhile. This is not a zero-cost addition; it immediately invites the question "why doesn't this work in multi-line literal strings?". Alternatively you can eliminate that inconsistency by adding it to both, but that breaks existing ML strings, as I noted above. Both are bad scenarios IMO.
So, with that reiterated, please address the question from my original reply: how is this any better than just using a double-quoted string, or a ML literal string? ("YAML does it" is not reason enough)
One thing I accidentally overlooked from @Keyacom's proposal was obvious: this is intended to offer a way to include an apostrophe in a literal string without making it multi-line.
But why is this important? Do four extra quote marks (two more on each side) cause users undue distress?
I'm still against the proposal, and @marzer made a strong case against it already. I don't rule it out entirely yet. But if we have something that works well (and I actually appreciate the trimmed leading newline feature, for both basic and literal variants), then what's preventing anyone from using it?
I think I'm going to close this proposal due to the criticism it gathered.
Preface
Allow apostrophe escaping in inline literal strings.
Why
The YAML specification nicely allows
'
to be placed inside a raw string by preceding it with itself:When fed to a YAML parser, both
ParsedString
andRawString
's values will be the same.TOML does not allow that within the current specification. I am proposing this, so you won't need to create a multiline literal string just to include an apostrophe in it.
Questions
You can ask more below!
Why not
\'
instead of''
?This could be influenced by how Ruby parses raw strings, but I don't want that because of the potential for ambiguous behavior of
\
, which could be a potentially breaking change.