toml-lang / toml

Tom's Obvious, Minimal Language
https://toml.io
MIT License
19.45k stars 847 forks source link

[Proposal] Allow apostrophe escaping in inline literal strings #949

Closed C-Ezra-M closed 1 year ago

C-Ezra-M commented 1 year ago

Preface

Allow apostrophe escaping in inline literal strings.

Why

The YAML specification nicely allows ' to be placed inside a raw string by preceding it with itself:

ParsedString: "Emma's boyfriend is nice."
RawString: 'Emma''s boyfriend is nice.'

When fed to a YAML parser, both ParsedString and RawString's values will be the same.

TOML does not allow that within the current specification. I am proposing this, so you won't need to create a multiline literal string just to include an apostrophe in it.

Questions

You can ask more below!

Why not \' instead of ''?

This could be influenced by how Ruby parses raw strings, but I don't want that because of the potential for ambiguous behavior of \, which could be a potentially breaking change.

marzer commented 1 year ago

When fed to a YAML parser [...]

In general I think where YAML chooses to do one thing, it would be to the betterment of TOML health to choose to explicitly not do that thing. There's just far too much magic in YAML. This is a good example; why add this magic when there's already two good alternatives that don't introduce unexpected behaviour? A double-quoted string would accept your example text verbatim without issue. More complex examples can use a ML literal string.

Edit: also worth noting that this would be a backwards-incompatible change; merely updating the TOML spec version of a parser would change the way a literal string featuring '' was parsed. There's a lot of TOML out in the world already.

eksortso commented 1 year ago

To illustrate @marzer's point, here are two ways to state the string Emma's boyfriend is nice. in TOML without having to escape the apostrophe.

ParsedString = "Emma's boyfriend is nice."
RawString = '''Emma's boyfriend is nice.'''

The RawString example uses multi-line literal string syntax (three single-quotes on each side). Within them, you can use apostrophes, or even literal pairs of apostrophes, freely.

The website toml.io even mentions this explicitly in its quick tour:

Since there is no escaping, there is no way to write a single quote inside a literal string enclosed by single quotes. That's where multi-line literal strings come in:

though that site doesn't actually use an example like this. I'll run this past them.

arp242 commented 1 year ago

Edit: also worth noting that this would be a backwards-incompatible change; merely updating the TOML spec version of a parser would change the way a literal string featuring '' was parsed. There's a lot of TOML out in the world already.

This isn't a concern since '' is only parsed inside '..'-delimited strings; so just '' is still an empty string. For example in SQL:

(1)=# select '', '''';
 ?column? │ ?column?
──────────┼──────────
          │ '

Not saying I'm in favour of this proposal, just pointing out it is backwards-compatible as far as I can see.

marzer commented 1 year ago

This isn't a concern since '' is only parsed inside '..'-delimited strings; so just '' is still an empty string. For example in SQL:

This is true for single quote strings (since it is currently impossible to form a single-line literal string containing '' without delimiting the string itself), though my line of thought was w.r.t multi-line literal strings. The sensible assumption would be that this escaping, if adopted, be extended to those as well (otherwise it would introduce an internal inconsistency that would be, er, very dumb), but then that means existing ML literal strings can change meaning:

quotes = '''''''' # '' before this proposal, ' after it

The workaround would be to not extend this magic escaping to ML strings, but then we'd have three different escape modes in the one language. Big YAML vibes.

arp242 commented 1 year ago

I assumed that ''-type escapes wouldn't take effect in triple-quoted strings, as there's no real need for it. (''' ' ''' is already valid).

eksortso commented 1 year ago

@Keyacom Have we sufficiently addressed the issue that your proposal was attempting to address? If so, please close this issue. If not, please give us more details to make your case for the proposal.

C-Ezra-M commented 1 year ago

A summary, so everything about my proposal is clear:

with open("example.toml", "rb") as file: l = tomllib.load(file)['example']['same-texts']

assert all(e == l[i+1] for i, e in enumerate(l[:-1]))

marzer commented 1 year ago

I'd like to reiterate my point from earlier, that with this proposal:

The workaround would be to not extend this magic escaping to ML strings, but then we'd have three different escape modes in the one language.

Currently TOML's literal strings are "what you see is what you get", with no escaping or magic at all, and ideally should stay that way. If we add this, multi-line literal strings would now behave differently to single-line literal strings in one specific case, which is weird.

(you might argue that literal strings do have some magic, the opening newline trimming thing, and you'd be right for some definition of "magic"; I also think that functionality was an unwise addition and wish it weren't in the language but that ship has sailed).

Adding an edge-case or exceptional behaviour to languages is not in-and-of-itself a bad thing, but a higher cognitive/contextual complexity cost must be offset sufficiently by some measure of usability improvement for it to be worthwhile. This is not a zero-cost addition; it immediately invites the question "why doesn't this work in multi-line literal strings?". Alternatively you can eliminate that inconsistency by adding it to both, but that breaks existing ML strings, as I noted above. Both are bad scenarios IMO.

So, with that reiterated, please address the question from my original reply: how is this any better than just using a double-quoted string, or a ML literal string? ("YAML does it" is not reason enough)

eksortso commented 1 year ago

One thing I accidentally overlooked from @Keyacom's proposal was obvious: this is intended to offer a way to include an apostrophe in a literal string without making it multi-line.

But why is this important? Do four extra quote marks (two more on each side) cause users undue distress?

I'm still against the proposal, and @marzer made a strong case against it already. I don't rule it out entirely yet. But if we have something that works well (and I actually appreciate the trimmed leading newline feature, for both basic and literal variants), then what's preventing anyone from using it?

C-Ezra-M commented 1 year ago

I think I'm going to close this proposal due to the criticism it gathered.