Closed Tronic closed 4 years ago
This is based on standard practices with text file formatting (removal of extra whitespace and adding LF after each line).
Adding \r explicitly at the end of each line completes the CR-LF sequence for Internet protocols (not even Windows needs it in text files anymore).
Any line within the block may be terminated by a backslash. This is useful for splitting otherwise overly long lines on multiple source code lines without adding LFs to the string, and on the last line to prevent the final newline.
This makes the heredoc situation in Nim too complicated IMO. """something goes here""".unindent
is a simple and satisfactory solution that needs no further enhancement, plus the fact that "oh, you can use escape sequences now!" makes things weirder to me. If you have too many concerns about lengthy string literals, honestly it's better to just not use heredocs and just staticRead()
from a textfile instead.
The """ hack of Python is problematic precisely because it mixes source code formatting with string contents. Having PyDocs or .unindent "handle" this is far from satisfactory.
import strutils
# Correct output but messed up source code formatting
for i in 1..2:
stdout.write("""<li>
Item
</li>
""")
# Incorrect output (Item not indented)
for i in 1..2:
stdout.write("""
<li>
Item
</li>
""".unindent)
# Proposed string literal: clean source code that matches output
for i in 1..2:
stdout.write(
":
<li>
Item
</li>
)
Fixing this in Python would be quite problematic at this time, but Nim as a new language based on indented blocks definitely /should/ get it right.
@awr1 Escape sequences and whitespace handling are mentioned for completeness. This proposal requires less of them than the current string literals do. Reading from external files is not really a solution. The need for longer string literals (beyond docstrings) is clear and that's why """ literals exist in the first place; their implementation just sucks.
Then IMO the behavior of unindent()
is probably incorrect, it should eliminate enough whitespace for up to the first non-whitespace character in the string (recording the number of whitespace characters as some variable x) and repeat that operation for every line in the string, eliminating only the first x whitespace characters.
A new function could be probably added to strutils
, or you could add a defaulted boolean option to unindent()
to avoid breaking API compat. I agree that this problem should be fixed, but the core language should not have to change for it.
Generally I like the idea. I never really liked triple string literals as they are messy. Yet I don't like to change the language for this minor annoyance if the workarounds that don't need a language change haven't been fully explored. Scala's solution to this problem is stripMargin
val speech = """Four score and
|seven years ago""".stripMargin
Another big problem is, I have no idea how to tell my editor (and github and all the other editors out there) that ":
is the start of a indentation block based string literal.
I made a quick proof of concept with minimal changes to lexer. Needs some further work even if accepted to language (like separate lexer token type for this literal).
IMHO the syntax should be:
const foo = '''
string literal here that
needs no closing quotes
but it's far too late for this. Yet another way to write string literals is the last thing we need. We would need to patch nimpretty
and every Nim syntax highlighter out there. And without highlighting support this feature seems to be quite dangerous.
@krux02 Most editors seem to ship Nim mode already, and would probably update their handling promptly if the language was changed.
Meanwhile, this certainly is a problem because many editors and Github syntax highlighter consider anything that follows to be a string, until the next " appears somewhere else, although even with the current language syntax (with any language out there, really) they should terminate single-quoted string processing at the first newline.
Indentation is not so much a problem; one extra tab press at most, because standard auto-indent behaves well with this literal.
Library solutions cannot work properly because once the string is formed, information about source code indentation is no longer available. Adding another special character to denote margin isn't really helpful. Also, such solutions cannot avoid the need to escape quote marks within the literal, like the string block does.
In any case, fixing this sort of issue is much better to do at Nim 0.21, a language used by a handful of projects, rather than after 1.0. Using ": as the token also does not affect existing software (although I would like to see """ deprecated and eventually removed entirely -- far prior to 1.0 release). First I considered """: or similar, but that would break existing software. Also, ":, if put on its own line, provides visual cue to where the left margin of string content goes (given that a string block must be indented exactly two spaces, which is already the recommended indentation for Nim).
Can this issue be moved to RFCs?
Regarding the symbol used to start it, ": directly communicates that it is string and a block but has the disadvantage of being mishandled by existing tools. Something that is not considered to be a start of string would be less invasive, e.g. $: would probably communicate the same thing in Nim context but the content would be seen as code in syntax highlighters, and the colon might trigger smart indentation in some tools (in particular, those based on Python rules).
I am definitely open to this sort of suggestion, although I believe that in the long run the support of current tools should not really be a consideration. The benefit of ": is that it instantly triggers any coder to notice that something unconventional is happening, while with $: that might not be as apparent, and the content being a string would be not at all apparent to non-Nim coders.
YAML already has a very well known and documented contruct for this, why not just use that.
I think is awesome that you can use literal JSON on Nim code directly, then maybe copy that feature of YAML too. YAML is an open format, and already supported by tons of software.
YAML syntax can be very friendly as start of a block because it uses :>
or :|
,
it can live on the sugar
module after all thats what Sugar suppose to do.
let variable0 = :>
YAML like literals.
let variable1 = :|
YAML like literals.
https://en.wikipedia.org/wiki/YAML#Indented_delimiting :thinking:
:|
and :>
could clash with user defined operators, while ":
cant.
but i dont see why this should be a language change, if all it does is breaks every syntax highlighter.
sugar.`:>`
then :grey_question:
I agree that I dont feel a huge need for this. 🤷♀️
FWIW, a comment at the end avoids problems with current highlighters without changing anything else (a simple hack - not part of RFC):
await client.send ":
HTTP/1.1 200 OK\r
content-type: text/plain\r
content-length: 13\r
\r
Hello World!
#"
a comment at the end avoids problems a hack
:thinking:
@Tronic which highlighters? github doesnt highlight correctly anyway. in my editor its this
which is correct representation of current syntax, since "
strings cant be multiline.
and no, #"
is not a solution even if it worked.
@SolitudeSF I use this in VSCode. Obviously tools need to be fixed, and that really shouldn't be a big issue. After all, they already manage to handle the mix of different quotation formats & comment parsing, incl. Nim-specific syntax and escape sequences.
For stuff like this I just use staticRead
🤷♀️
i dont see how this can be trivially fixed, since most editors use regex based highlighting which cant have indentation awareness.
Too bad you can not do the strformat
formatted multi-line literal fmt""" """
in there. :crying_cat_face:
@Tronic If editor support can't be provided, I can only reject this feature. What value does it have when virtually no editor will support it, or if it will take years until the editors will have a solution for it? Also I am the one who maintains the emacs integration at this point, it is not like that emacs will magically grow support for this feature.
I'll admit I was wrong about unindent()
not needing any change, but I would much prefer unindent()
to be fixed. It honestly feels way too late in the game for a grammar change like this, especially one that may not be reliably workable with certain editor syntax highlighting engines.
@SolitudeSF Regex cannot match indent?
\n([ \t]*)[^\n]*":\n(\1 [^\n]*\n|[ \t]*\n)*
matches this string block. Use backward lookup or editor's custom handling of captures, if necessary. Every serious editor implements some sort of recursive matching in addition to basic regex to be able to do parenthesis matching, to handle HTML closing tags etc.
If nimpretty is a concern, I am sure I can quickly patch that as well.
If this can be properly highlighted with a tmLanguage
syntax definition (what vscode and many other editors uses) I would be interested to see how. I think it's impossible but I don't know for sure. I tested the YAML tmLanguage
syntax definition and it seems pretty broken for strings.
This sort of approach seems to work (tried in VSCode):
"begin": "( *)(\":)$",
"while": "^\\1 ",
I'll have a proper look later.
@Tronic AFAICT your regex cannot deal with arbitrary indentation. In Nim indenting with all numbers of spaces is allowed. If there exists a regex that can work with arbitrary indentation then I will support this feature.
VSCode highlighter updated to support r":
and ":
literals. It seems to be working but needs more testing.
@Clyybber Surrounding code may be indented by arbitrary number of spaces. String block contents must be indented by exactly two spaces, compared to the leading line, as discussed in this thread. This is to allow indentation to appear within string content, so any indentation on top of those two spaces are included in the string.
The highlighter marks string content and block indent with separate classes, so that in principle one could style and make the two-space margin visible by CSS effects (not that I recommend doing so).
@Tronic I don't think we should enforce those to be indented by exactly two spaces. Instead make the first line dictate the indentation, or make the line with the least indentation inside the string block dictate the indentation.
@Tronic relevant discussion: https://forum.nim-lang.org/t/471#23415 (Does Nimrod have a heredoc syntax?) this RFC would have to compare its merits against heredoc.
(as used in D, see https://forum.nim-lang.org/t/471#23415):
.unindent
) is an option if user prefers to keep their string at block indent\n
let s = q"EOS
This is a multi-line
heredoc string; no need to re-indentEOS"
echo s
produces: This is a multi-line\nheredoc string; no need to re-indent
If the argument is "you can always come up with a delimiter that isn't used" then Nim's triple quotes work just as well:
const
s = """
foobar
UNUSED_DELIM
baz
""".replace("UNUSED_DELIM", "\"\"\"")
Requires no language change and is easier to implement for highlighters as it doesn't involve a regex with backtracking (which is NP complete iirc?)
This is how string literals work in c++11
, where R"V0G0N(
and )V0G0N"
act as delimiters.
const char * vogon_poem = R"V0G0N(
O freddled gruntbuggly thy micturations are to me
As plured gabbleblochits on a lurgid bee.
Groop, I implore thee my foonting turlingdromes.
And hooptiously drangle me with crinkly bindlewurdles,
Or I will rend thee in the gobberwarts with my blurlecruncheon, see if I don't.
(by Prostetnic Vogon Jeltz; see p. 56/57)
)V0G0N";
Not only does it allow to specify arbitrary delimiters that won't clash with the content, it would also allow to write editor extensions that detect such string blocks for syntax highlighting. Then you can can have SQL strings, python strings, etc all with correct syntax highlighting. Currently Nim has call string literals, for example SQL"""select elephant from africa"""
. This already works partially, but it won't work for embedded python strings that well, as """
is a very common python token.
yes, that's C++'s version of D's heredoc string I mentioned above in https://github.com/nim-lang/RFCs/issues/161#issuecomment-523687596 . Ability to copy paste code without messing with replace
to fixup a delimiter to escape (https://github.com/nim-lang/RFCs/issues/161#issuecomment-523795949) is nice. Yes, it's one more thing to learn though.
@krux02 Theoretically it can overcome the delimiter appearing in content problem. In practice everyone just uses it as another form of """ and complains that R"(...)"
looks uglier than the same thing in some other language. As @Araq pointed out, one should not be required to invent unique identifiers. Also, this sort of literal completely fails to address the indentation problem.
Indented-block literals make a clear separation between source code formatting (indent of the block) and string content (any characters within the block). This way clean source code formatting can be preserved without introducing extra whitespace into the string.
For me it is actually really hard to understand how in 2010's people still design formats with issues that were widely understood and fixed in 1990's if not decades earlier. I presume that the argument has always been that "we cannot fix this because of compatibility" and that "it would take years". As I have demonstrated in this thread, fixing it both in the Nim compiler and in popular text editors took only few hours of work, and frankly I've already spent far more than that here, arguing for it.
As I have demonstrated in this thread, fixing it both in the Nim compiler and in popular text editors took only few hours of work
Well we need to check that. I'm not convinced that popular text editors can be "fixed".
@Tronic I might have changed my opinion about this kind of string literals. They are very valuable for emit and asm statements. They are also very useful for my very own project here.
With the introduction of strutils.dedent
this can now be closed:
import strutils
proc foo =
let str = dedent """
Hello
World!
"""
stdout.write str
foo()
will print
Hello
World!
So, I have to import strutils
for this. IMO triple string literals should be dedented by default in 2.0 (like julia).
@AmjadHD I would suggest writing another proposal for that. I'm even optimistic that it would be accepted, since it's unlikely to cause too much breakage.
I would suggest -- instead of, or in addition to Python-style """ literals -- using indented block syntax for multi-line string literals. E.g.
Where str is defined equivalent to
This syntax avoids the indentation problem with string literals that .unindent attempts to address. Also, for clarity, all string content appears within the block, not on the opening or closing lines as is with """.
The literal terminates as soon as the block ends (i.e. a non-empty line indented less is found), avoiding the need for """ at the end. This also avoids the need to escape double quotes that belong to the string.
Whitespace at the end of any line and empty lines at the end would be omitted (and could be added via escape sequences in the rare cases where needed). Whitespace-only lines in the middle would become simply \l (no matter if there are spaces or not). This removes any ambiguity with source code formatting and makes the intention explicit.
This suggestion proposes string block to be indented by exactly two spaces (compared to the line with ": in it). Any further initial spaces would become string content.
This could still be used within parenthesis or other expression, provided that the continuation of that expression appears less indented than the string content.