toml-lang / toml

Tom's Obvious, Minimal Language
https://toml.io
MIT License
19.43k stars 846 forks source link

Support single-quoted strings to avoid double \ #188

Closed bbangert closed 9 years ago

bbangert commented 11 years ago

The example for TOML shows how bad it gets when writing a Windows path that has \, unfortunately some things that get configured need to let a user enter a regular expression. Regular expressions are filled with back-slashes, and adding the extra backslash constantly is very painful.

To give TOML a fighting chance when someone might need to use a few backslashes in a config file, I propose using single quotes to designate a raw string, ie:

somevar = 'this is a \d+ \w+ type of string'

This would retain the existing double-quote rules to avoid breaking existing usage.

johanfange commented 10 years ago

I would prefer both:

somevar = 'single\quote'

and (perhaps?):

somevar = ```
line1

line2

meaning "line1\n\nline2"


I guess this isn't so bad either though, since users can always just copy-paste...

somevar = back\ticks

BurntSushi commented 10 years ago

I think at this point it should be a heredoc syntax where the last newline is eaten or backticks.

But you were against HEREDOC syntax before! e.g., r##"str..."## is HEREDOC, just with slightly different syntax. The point of HEREDOC is that you get to pick your delimiters, which is exactly what r##"..."## is.

sorbits commented 10 years ago

Single quotes for strings without escape codes are used by bash, perl, ruby, PHP, and probably many other formats/languages, so claiming it’s a mistake seems a bit out of proportion.

Markdown was created to write prose, where you don\'t want to write like this.

I wouldn’t mind using backticks for raw strings, it’s better than custom delimeters (or no literal strings at all). But as previously indicated my preference is definitely single quoted strings with no escape sequences beyond ''.

The main problem with '' as escape sequence is discoverability. OTOH having \ and \' as escape sequences will appear confusing to people who haven’t read the spec, or even people who have. E.g. user crafts a regexp to match digits: '\d+', then user wants to match them with a leading backslash, like \42, so user does: '\d+', but that’s not going to work, it has to be: '\\d+' with '\\d+' working as well.

On 27 Jun 2014, at 18:39, anaxagoras wrote:

The github markdown uses


To delineate raw blocks with the logic that, even if you might want a 
pair of quote-like characters in your string to represent some kind of 
empty string, you're very rarely going to use three in a row.  I 
believe they chose backticks instead of quotes for a similar reason: 
they're comparatively rare.

I think at this point it should be a heredoc syntax where the last 
newline is eaten or backticks. Single quotes are a mistake, I feel.

---

Reply to this email directly or view it on GitHub:
https://github.com/toml-lang/toml/issues/188#issuecomment-47371998
flowchartsman commented 10 years ago
But you were against HEREDOC syntax before!

I was referring to something like the standardized style we see here: (```), but I'm also for reaching consensus. If backticks aren't acceptable, I think even some kind of heredoc (be it flexible or standardized) is better than single-quotes.

mojombo commented 10 years ago

I see raw strings as serving a few simple purposes:

  1. Make it easy and nice looking to specify Windows paths and regexen (unescaped backslash).
  2. Make it easy to write strings that need literal double quotes.

The normal cases, then, work great with single quotes and NO escaping allowed whatsoever:

path = 'C:\foo\bar'
path2 = '\\ServerX\qux\'
regex = '(\d+)'
quoted = 'Tom "Dubs" Preston-Werner'

The only thing not allowed in these single quoted strings is a single quote. If we also introduce multiline strings and include a multi-line raw version delimited by ''' (3 single quotes) then we solve for cases where single quotes are needed:

regex2 = '''I [dw]on't need \d{2} apples'''

The only thing not allowed here is a triple single quote sequence '''. If you need to represent that in a string, then you can always express them (and, indeed, any string in the universe) in a normal double quoted string:

text = "In TOML, raw multi-line text is surrounded by ''' (three single quotes)."

I see raw strings as a convenience, and as such, if we can solve the 99% normal case by not insisting that they can express all arbitrary strings, then that's ok. Practicality over purity.

redhotvengeance commented 10 years ago

@mojombo :+1: Simple and covers all of the bases.

BurntSushi commented 10 years ago

@mojombo I'm fine with that.

If there aren't any major objections, I'll draw up a PR tonight or tomorrow morning.

johanfange commented 10 years ago

@mojombo Excellent!

What about newlines at beginning/end of multi-line strings? (discussion moved to #225)

BurntSushi commented 10 years ago

@johanfange Please discuss multiple lines in https://github.com/toml-lang/toml/pull/225. There's already a spec drawn up. I'll modify it once raw strings gets in.

flowchartsman commented 10 years ago
The only thing not allowed in these single quoted strings is a single quote. 

I think this is a mistake unless you also introduce the triple-quoting version at the same time. I see single quotes in regexes pretty frequently.

mojombo commented 10 years ago

@anaxagoras I agree, for my proposal to work, it requires a multi-line raw string simultaneously.

BurntSushi commented 10 years ago

@mojombo @anaxagoras OK. I'll put them into one PR once I get home.

BurntSushi commented 10 years ago

See #228.

mojombo commented 9 years ago

Literal strings were added by #232 and solves the problems posed in this issue. Closing!